Parameter learning from data with missing values

Practical approaches to parameter learning assume that local distributions are independent from each other in order to estimate their parameters individually. In other words, the parameter estimates for any local distribution are computed independently from the parameter estimates for other local distributions. As a result, estimating the parameters of a local distribution only requires the data observed for the variables involved in that specific local distribution (the node and its parents).

This allows bn.fit() to handle missing values transparently by only using locally complete observations for each local distribution. For instance, in the case of discrete Bayesian networks we have:

> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]")
> missing = matrix(FALSE, nrow(learning.test), ncol(learning.test))
> missing[sample(length(missing), 1000)] = TRUE
> incomplete = learning.test
> incomplete[missing] = NA
> fitted = bn.fit(dag, incomplete)

To estimate the conditional probability table of B given A,

> fitted$B

  Parameters of node B (multinomial distribution)

Conditional probability table:
 
   A
B            a          b          c
  a 0.85649936 0.44887460 0.11595131
  b 0.02445302 0.21929260 0.09481102
  c 0.11904762 0.33183280 0.78923767

the relevant data can be summarized as:

> table(incomplete[, c("B", "A")], useNA = "always")
      A
B         a    b    c <NA>
  a    1331  698  181   78
  b      38  341  148   24
  c     185  516 1232   62
  <NA>   47   63   52    4

and the maximum likelihood estimates of the conditional probabilities are computed from the counts arising from the observations in which both A and B are observed.

> prop.table(table(incomplete[, c("B", "A")], useNA = "no"), "A")
   A
B            a          b          c
  a 0.85649936 0.44887460 0.11595131
  b 0.02445302 0.21929260 0.09481102
  c 0.11904762 0.33183280 0.78923767

Bayesian estimates of conditional probabilities are computed in the same way, but involving a prior distribution.

As a second example, in the case of Gaussian Bayesian networks we have:

> dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]")
> missing = matrix(FALSE, nrow(gaussian.test), ncol(gaussian.test))
> missing[sample(length(missing), 1000)] = TRUE
> incomplete = gaussian.test
> incomplete[missing] = NA
> fitted = bn.fit(dag, incomplete)

and the maximum likelihood estimates of the regression coefficients in the local distribution of C against A and B are identical to those produced by lm() when na.action = na.omit.

> fitted$C

  Parameters of node C (Gaussian distribution)

Conditional density: C | A + B
Coefficients:
(Intercept)            A            B  
   2.002181     1.992764     1.999319  
Standard deviation of the residuals: 0.5077919
> lm(C ~ A + B, data = incomplete, na.action = na.omit)

Call:
lm(formula = C ~ A + B, data = incomplete, na.action = na.omit)

Coefficients:
(Intercept)            A            B  
      2.002        1.993        1.999
Last updated on Sat Sep 4 18:24:41 2021 with bnlearn 4.7-20210901 and R version 4.1.1 (2021-08-10).