Fitting Bayesian network's parameters

Learning the network structure

For this example we will initially use the learning.test data set shipped with bnlearn. Its network structure (described here and here) can be learned with any of the algorithms implemented in bnlearn; we will use IAMB in the following.

> library(bnlearn)
Loading required package: methods
> data(learning.test)
> pdag = iamb(learning.test)
> pdag

  Bayesian network learned via Constraint-based methods

  model:
    [partially directed graph]
  nodes:                                 6 
  arcs:                                  5 
    undirected arcs:                     1 
    directed arcs:                       4 
  average markov blanket size:           2.33 
  average neighbourhood size:            1.67 
  average branching factor:              0.67 

  learning algorithm:                    IAMB 
  conditional independence test:         Mutual Information (disc.) 
  alpha threshold:                       0.05 
  tests used in the learning procedure:  94 
  optimized:                             FALSE

As we can see from the output above, there is a single undirected arc in pdag; IAMB was not able to set its orientation because its two possible direction are score equivalent.

> score(set.arc(pdag, from = "A", to = "B"), learning.test)
[1] -24006.73
> score(set.arc(pdag, from = "B", to = "A"), learning.test)
[1] -24006.73

Setting the direction of undirected arcs

Due to the way Bayesian networks are defined the network structure must be a directed acyclic graph (DAG); otherwise their parameters cannot be estimated because the factorization of the global probability distribution of the data into the local ones (one for each variable in the model) is not completely known.

If a network structure contains one or more undirected arcs, their direction can be set in the two ways:

  • with the set.arc() function, if the direction of the arc is known or can be guessed from the experimental setting of the data:
    > dag = set.arc(pdag, from = "B", to = "A")
    
  • with the pdag2dag() function, if the topological ordering of the nodes is known or (again) if it can be guessed in terms of causal relationships from the experimental setting:
    > dag = pdag2dag(pdag, ordering = c("A", "B", "C", "D", "E", "F"))
    

Fitting the parameters (Maximum Likelihood estimates)

Discrete data

Now that the Bayesian network structure is completely directed, we can fit the parameters of the local distributions, which take the form of conditional probability tables.

> fit = bn.fit(dag, learning.test)
> fit

  Bayesian network parameters

  Parameters of node A (multinomial distribution)

Conditional probability table:
     a     b     c 
0.334 0.334 0.332 

  Parameters of node B (multinomial distribution)

Conditional probability table:
 
   A
B        a      b      c
  a 0.8561 0.4449 0.1149
  b 0.0252 0.2210 0.0945
  c 0.1187 0.3341 0.7906

  Parameters of node C (multinomial distribution)

Conditional probability table:
      a      b      c 
0.7434 0.2048 0.0518 

  Parameters of node D (multinomial distribution)

Conditional probability table:
 
, , C = a

   A
D        a      b      c
  a 0.8008 0.0925 0.1053
  b 0.0902 0.8021 0.1117
  c 0.1089 0.1054 0.7830

, , C = b

   A
D        a      b      c
  a 0.1808 0.8830 0.2470
  b 0.1328 0.0702 0.4939
  c 0.6864 0.0468 0.2591

, , C = c

   A
D        a      b      c
  a 0.4286 0.3412 0.1333
  b 0.2024 0.3882 0.4444
  c 0.3690 0.2706 0.4222


  Parameters of node E (multinomial distribution)

Conditional probability table:
 
, , F = a

   B
E        a      b      c
  a 0.8052 0.2059 0.1194
  b 0.0974 0.1797 0.1145
  c 0.0974 0.6144 0.7661

, , F = b

   B
E        a      b      c
  a 0.4005 0.3168 0.2376
  b 0.4903 0.3664 0.5067
  c 0.1092 0.3168 0.2557


  Parameters of node F (multinomial distribution)

Conditional probability table:
     a     b 
0.502 0.498

The conditional probability table of each variable can be accessed with the usual $ operator:

> fit$D

  Parameters of node D (multinomial distribution)

Conditional probability table:
 
, , C = a

   A
D        a      b      c
  a 0.8008 0.0925 0.1053
  b 0.0902 0.8021 0.1117
  c 0.1089 0.1054 0.7830

, , C = b

   A
D        a      b      c
  a 0.1808 0.8830 0.2470
  b 0.1328 0.0702 0.4939
  c 0.6864 0.0468 0.2591

, , C = c

   A
D        a      b      c
  a 0.4286 0.3412 0.1333
  b 0.2024 0.3882 0.4444
  c 0.3690 0.2706 0.4222

and it can be print()ed arranging the dimensions in various ways with the perm argument.

> print(fit$D, perm = c("D", "C", "A"))

  Parameters of node D (multinomial distribution)

Conditional probability table:
 
, , A = a

   C
D        a      b      c
  a 0.8008 0.1808 0.4286
  b 0.0902 0.1328 0.2024
  c 0.1089 0.6864 0.3690

, , A = b

   C
D        a      b      c
  a 0.0925 0.8830 0.3412
  b 0.8021 0.0702 0.3882
  c 0.1054 0.0468 0.2706

, , A = c

   C
D        a      b      c
  a 0.1053 0.2470 0.1333
  b 0.1117 0.4939 0.4444
  c 0.7830 0.2591 0.4222

It can also be plotted with the bn.fit.barchart() and bn.fit.dotplot() functions (manual page).

> bn.fit.barchart(fit$D)
Loading required namespace: lattice
plot of chunk unnamed-chunk-10
> bn.fit.dotplot(fit$D)
plot of chunk unnamed-chunk-11

Continuous data

Fitting the parameters of a Gaussian Bayesian network (e.g. the regression coefficients for each variable against its parents) is done in the same way.

> data(gaussian.test)
> pdag = iamb(gaussian.test)
> undirected.arcs(pdag)
     from to 
[1,] "B"  "D"
[2,] "D"  "B"
> dag = set.arc(pdag, "D", "B")
> fit = bn.fit(dag, gaussian.test)
> fit

  Bayesian network parameters

  Parameters of node A (Gaussian distribution)

Conditional density: A
Coefficients:
(Intercept)  
       1.01  
Standard deviation of the residuals: 1 

  Parameters of node B (Gaussian distribution)

Conditional density: B | D
Coefficients:
(Intercept)            D  
     -3.970        0.664  
Standard deviation of the residuals: 0.219 

  Parameters of node C (Gaussian distribution)

Conditional density: C | A + B
Coefficients:
(Intercept)            A            B  
          2            2            2  
Standard deviation of the residuals: 0.509 

  Parameters of node D (Gaussian distribution)

Conditional density: D
Coefficients:
(Intercept)  
       9.05  
Standard deviation of the residuals: 4.56 

  Parameters of node E (Gaussian distribution)

Conditional density: E
Coefficients:
(Intercept)  
       3.49  
Standard deviation of the residuals: 1.99 

  Parameters of node F (Gaussian distribution)

Conditional density: F | A + D + E + G
Coefficients:
(Intercept)            A            D            E            G  
   -0.00605      1.99485      1.00564      1.00258      1.49437  
Standard deviation of the residuals: 0.996 

  Parameters of node G (Gaussian distribution)

Conditional density: G
Coefficients:
(Intercept)  
       5.03  
Standard deviation of the residuals: 1.98

The functions coefficients(), fitted() and residuals() (manual page) allow an easy extraction of the quantities of interest from both a single variable and the whole network.

> coefficients(fit$F)
(Intercept)           A           D           E           G 
   -0.00605     1.99485     1.00564     1.00258     1.49437
> str(residuals(fit$F))
 num [1:5000] -0.861 1.271 -0.262 -0.479 -0.782 ...
> str(fitted(fit$F))
 num [1:5000] 25.6 35.5 22.3 23.8 25.3 ...
> str(fitted(fit))
List of 7
 $ A: num [1:5000] 1.01 1.01 1.01 1.01 1.01 ...
 $ B: num [1:5000] 1.78 11.54 3.37 3.96 4.35 ...
 $ C: num [1:5000] 8.09 24.16 11.76 11.38 12 ...
 $ D: num [1:5000] 9.05 9.05 9.05 9.05 9.05 ...
 $ E: num [1:5000] 3.49 3.49 3.49 3.49 3.49 ...
 $ F: num [1:5000] 25.6 35.5 22.3 23.8 25.3 ...
 $ G: num [1:5000] 5.03 5.03 5.03 5.03 5.03 ...

As before, there are some functions to plot these quantities: bn.fit.qqplot(), bn.fit.xyplot() and bn.fit.histogram() (see their manual page). In addition to single-node plots such as those shown above, in the case of Gaussian Bayesian networks we can plotting all the nodes in a single plot.

> bn.fit.qqplot(fit)
plot of chunk unnamed-chunk-14
> bn.fit.xyplot(fit)
plot of chunk unnamed-chunk-15
> bn.fit.histogram(fit)
plot of chunk unnamed-chunk-16

Hybrid data (mixed discrete and continuous)

The parameters of a conditional linear Gaussian network take the form of:

  • conditional probability tables for discrete nodes, which can only have other discrete nodes as parents;
  • collections of linear regressions for Gaussian nodes.

In the latter case, there is one regression for each configuration of the discrete parents of the Gaussian nodes; and each regression has all the Gaussian parents of the node as explanatory variables.

> data(clgaussian.test)
> dag = hc(clgaussian.test)
> fit = bn.fit(dag, clgaussian.test)
> fit

  Bayesian network parameters

  Parameters of node A (multinomial distribution)

Conditional probability table:
      a      b 
0.0948 0.9052 

  Parameters of node B (multinomial distribution)

Conditional probability table:
     a     b     c 
0.410 0.188 0.402 

  Parameters of node C (multinomial distribution)

Conditional probability table:
     a     b     c     d 
0.249 0.251 0.398 0.102 

  Parameters of node D (conditional Gaussian distribution)

Conditional density: D | A + H
Coefficients:
                  0       1
(Intercept)   5.292  10.041
H             0.882   0.983
Standard deviation of the residuals:
    0      1  
0.510  0.307  
Discrete parents' configurations:
   A
0  a
1  b

  Parameters of node E (conditional Gaussian distribution)

Conditional density: E | B + D
Coefficients:
                 0      1      2
(Intercept)  0.995  4.344  7.919
D            2.352  1.151  0.674
Standard deviation of the residuals:
    0      1      2  
0.508  0.992  1.519  
Discrete parents' configurations:
   B
0  a
1  b
2  c

  Parameters of node F (multinomial distribution)

Conditional probability table:
 
, , C = a

   B
F        a      b      c
  a 0.0788 0.1498 0.5116
  b 0.9212 0.8502 0.4884

, , C = b

   B
F        a      b      c
  a 0.4553 0.2629 0.4773
  b 0.5447 0.7371 0.5227

, , C = c

   B
F        a      b      c
  a 0.6791 0.4230 0.7494
  b 0.3209 0.5770 0.2506

, , C = d

   B
F        a      b      c
  a 0.8571 0.4545 0.7368
  b 0.1429 0.5455 0.2632


  Parameters of node G (conditional Gaussian distribution)

Conditional density: G | A + D + E + F
Coefficients:
                0     1     2     3
(Intercept)  4.95  2.38  3.49  2.14
D            2.25  4.07  2.99  5.81
E            1.00  1.00  1.00  1.00
Standard deviation of the residuals:
     0       1       2       3  
0.0546  0.1495  0.2627  0.3518  
Discrete parents' configurations:
   A  F
0  a  a
1  b  a
2  a  b
3  b  b

  Parameters of node H (Gaussian distribution)

Conditional density: H
Coefficients:
(Intercept)  
       2.34  
Standard deviation of the residuals: 0.121

The functions coefficients(), fitted() and residuals() (manual page) work as before. Plotting functions can be applied to single nodes to produce plots analogous to those above. In the case of Gaussian nodes, one panel is produced for every configuration of the discrete parents; the labels match those in the output of print().

> bn.fit.qqplot(fit$G)
plot of chunk unnamed-chunk-18

Fitting the parameters (Bayesian Posterior estimates)

Discrete data

As an alternative to classic maximum likelihood approaches, we can also fit the parameters of the network in a Bayesian way using the expected value of the posterior distribution. The only difference from the workflow illustrated above is that method = "bayes" must be specified in bn.fit().

> pdag = iamb(learning.test)
> dag = set.arc(pdag, from = "A", to = "B")
> fit = bn.fit(dag, learning.test, method = "bayes")
> fit

  Bayesian network parameters

  Parameters of node A (multinomial distribution)

Conditional probability table:
     a     b     c 
0.334 0.334 0.332 

  Parameters of node B (multinomial distribution)

Conditional probability table:
 
   A
B        a      b      c
  a 0.8560 0.4449 0.1150
  b 0.0252 0.2210 0.0945
  c 0.1187 0.3341 0.7905

  Parameters of node C (multinomial distribution)

Conditional probability table:
      a      b      c 
0.7433 0.2048 0.0519 

  Parameters of node D (multinomial distribution)

Conditional probability table:
 
, , C = a

   A
D        a      b      c
  a 0.8008 0.0925 0.1053
  b 0.0903 0.8020 0.1118
  c 0.1090 0.1054 0.7829

, , C = b

   A
D        a      b      c
  a 0.1808 0.8829 0.2470
  b 0.1328 0.0703 0.4938
  c 0.6863 0.0469 0.2592

, , C = c

   A
D        a      b      c
  a 0.4284 0.3412 0.1336
  b 0.2026 0.3882 0.4443
  c 0.3690 0.2707 0.4221


  Parameters of node E (multinomial distribution)

Conditional probability table:
 
, , F = a

   B
E        a      b      c
  a 0.8052 0.2060 0.1194
  b 0.0974 0.1798 0.1145
  c 0.0974 0.6142 0.7661

, , F = b

   B
E        a      b      c
  a 0.4005 0.3168 0.2376
  b 0.4902 0.3664 0.5067
  c 0.1093 0.3168 0.2557


  Parameters of node F (multinomial distribution)

Conditional probability table:
     a     b 
0.502 0.498

The imaginary or equivalent sample size of the prior distribution can be specified using the iss parameter (documented here); it specifies the weight of the prior compared to the sample and thus controls the smoothness of the posterior distribution.

Last updated on Tue Nov 14 15:40:44 2017 with bnlearn 4.3-20171001 and R version 3.0.2 (2013-09-25).