Bayesian networks and cross-validation

Choosing a Bayesian network learning strategy

Cross-validation is a standard way to obtain unbiased estimates of a model's goodness of fit. By comparing such estimates for different learning strategies (different combinations of learning algorithms, fitting techniques and the respective parameters) we can choose the optimal one for the data at hand in a principled way.

bnlearn implements three cross-validation methods in the bn.cv() function (documented here):

  1. k-fold cross-validation (the default): the data are randomly partitioned into k subsets. Each subset is used in turn to validate the model fitted on the remaining k - 1 subsets.
  2. Custom folds cross-validation: the data are manually partitioned into subsets by the user into subsets, which are then used as in k-fold cross-validation. Subsets are not constrained to have the same size.
  3. Hold-out cross-validation: the data are repeatedly (randomly) partitioned in a training and test subset of given size m and n - m, with observations assigned randomly to each subset at each repetition. Each test subset is used to validate the model fitted on the corresponding training subset.

bn.cv() requires the following parameters for all methods:

  1. method: the label of the cross-validation method, either "k-fold" (the default), "custom-fold" or "hold-out";
  2. data: the data set the Bayesian network will be estimated from;
  3. bn: the structure learning algorithm to be used, or a fixed network structure (to cross-validate just parameter learning);
  4. loss:: the loss function used to evaluate the goodness of fit of the learned network;
  5. fit: the fitting method (such as mle or bayes) to be used to fit the parameters.

Additional parameters for all these components can be specified as well, providing a fine-grained control of the learning strategy, using the following arguments of bn.cv():

  • algorithm.args for additional arguments to be passed to the learning algorithm;
  • loss.args for additional arguments to be passed to the loss function;
  • fit.args for additional arguments to be passed to bn.fit() for fitting the parameters of the Bayesian network.

k-fold cross-validation

For example, let's compare the performance of three different scores in a hill climbing search on the ALARM data set included in bnlearn. The loss funtion we use is the so-called log-likelihood loss, which is the negated expected loglikelihood; so lower values are better. Hill cimbing is implemented in the hc() function.

> library(bnlearn)
## Loading required package: methods
> data(alarm)
> bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bde", iss = 1))

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss (disc.) 
  expected loss:                         10.8272

In this case the bn argument specifies the label of a structure learning algorithm, and algorithm.args is an optional list of arguments for the algorithm specified by bn.

> bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bde", iss = 10))

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss (disc.) 
  expected loss:                         10.82355
> bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bic"))

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss (disc.) 
  expected loss:                         10.86796

Partially directed graphs obtained from learning are automatically extended to directed acyclic graphs with cextend, so that the parameters can be fit and the loss function computed.

> bn.cv(alarm, bn = "iamb")

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             IAMB 
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss (disc.) 
  expected loss:                         13.93915

Tipically, 10-fold cross-validation is performed 10 times for each learning strategy (golden standard originally introduced by Hastie, Tibshirani and Friedman in «The Elements of Statistical Learning», link), and the resulting sets of loss values compared by plotting them as boxplots. Below we do so that for the BIC and BDe scores using hc().

> cv.bic = bn.cv(alarm, bn = "hc", runs = 10,
+            algorithm.args = list(score = "bic"))
> cv.bde = bn.cv(alarm, bn = "hc", runs = 10,
+            algorithm.args = list(score = "bde", iss = 1))
> plot(cv.bic, cv.bde, xlab = c("BIC", "BDe"))
## Loading required namespace: lattice
plot of chunk unnamed-chunk-6

Note that, while it is customary to use k = 10, a different number of folds can be specified with the k argument.

> bn.cv(alarm, bn = "hc", k = 5)

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       5 
  loss function:                         Log-Likelihood Loss (disc.) 
  expected loss:                         10.86357

Custom folds in cross-validation

The custom-folds method works in the same way as k-fold cross-validation, but folds are specified manually with a folds argument.

> folds = list(1:5000, 5001:15000, 15001:20000)
> cv.custom = bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bic"),
+               method = "custom-folds", folds = folds)
> cv.custom

  custom-folds cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  loss function:                         Log-Likelihood Loss (disc.) 
  expected loss:                         10.85651

The two methods are otherwise identical, as shown below: the R objects returned by bn.cv() are identical (with the exception of the method's label) if the same folds are used.

> cv.random = bn.cv(alarm, bn = "hc", method = "k-fold")
> folds = lapply(cv.random, `[[`, "test")
> cv.custom = bn.cv(alarm, bn = "hc", method = "custom-folds",
+                 folds = folds)
> all.equal(cv.random, cv.custom, check.attributes = FALSE)
[1] TRUE

Note that:

  1. Each observation can only be included in a single fold.
    > folds = list(1:5000, 4000:10000, 10001:20000)
    > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)
    
    ## Error in check.single.run(folds = folds, n = n): some observations are included in more than one fold.
    
  2. Each fold must contain at least one observation.
    > folds = list(numeric(0), 1:2000, 2001:20000)
    > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)
    
    ## Error in check.single.run(folds = folds, n = n): some folds contain no observations.
    
  3. All observations must be assigned to the folds.
    > folds = list(1:1000, 2000:20000)
    > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)
    
    ## Error in check.single.run(folds = folds, n = n): not all observations are assigned to a fold.
    
  4. runs is not a valid argument for this cross-validation method, because the ability of performing multiple runs relies on splitting the data randomly into folds.

To perform multiple runs, each set of manually specified folds (one for each run) should be saved in a list.

> folds = list(
+   list(1:5000, 5001:10000, 10001:20000),
+   list(1:10000, 10001:15000, 15001:20000),
+   list(1:5000, 5001:15000, 15001:20000)
+ )
> bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)

  custom-folds cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  loss function:                         Log-Likelihood Loss (disc.) 
  number of runs:                        3 
  average loss over the runs:            10.86871 
  standard deviation of the loss:        0.01063207

The following function can be used to generate the folds randomly,

> random.folds = function(data, k) split(sample(nrow(data)), seq(k))

for a single run or cross-validation:

> folds = random.folds(data = alarm, k = 10)
> bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)

  custom-folds cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  loss function:                         Log-Likelihood Loss (disc.) 
  expected loss:                         10.86583

or for multiple runs.

> folds = replicate(10, random.folds(data = alarm, k = 10), simplify = FALSE)
> bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)

  custom-folds cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  loss function:                         Log-Likelihood Loss (disc.) 
  number of runs:                        10 
  average loss over the runs:            10.87163 
  standard deviation of the loss:        0.003236653

Hold-out cross-validation

The syntax for hold-out cross-validation is very similar to that for k-fold cross-validation, but:

  • method must be set to "hold-out";
  • k denotes the number of times the sample will be split into training and test subsamples (the default is 10);
  • and optionally m denotes the number of observations to be sampled for the test subsample (the default is 1/10th of the sample size).

So, for instance:

> bn.cv(alarm, bn = "hc", method = "hold-out", k = 25, m = 200,
+   algorithm.args = list(score = "bde", iss = 10))

  hold-out cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of splits:                      25 
  size of the test subset:               200 
  loss function:                         Log-Likelihood Loss (disc.) 
  expected loss:                         10.73383

Comparing different network structures

bn.cv() can also be used to compare the goodness of fit of different network structures suggested by expert knowledge; using all the data to fit the parameters of those networks and to evaluate their goodness of fit produces optimistically biased results.

Let's start with the most obvious comparison: the true network structure against the empty one.

> res = empty.graph(names(alarm))
> modelstring(res) = paste("[HIST|LVF][CVP|LVV][PCWP|LVV][HYP][LVV|HYP:LVF]",
+   "[LVF][STKV|HYP:LVF][ERLO][HRBP|ERLO:HR][HREK|ERCA:HR][ERCA]",
+   "[HRSA|ERCA:HR][ANES][APL][TPR|APL][ECO2|ACO2:VLNG][KINK]",
+   "[MINV|INT:VLNG][FIO2][PVS|FIO2:VALV][SAO2|PVS:SHNT][PAP|PMB][PMB]",
+   "[SHNT|INT:PMB][INT][PRSS|INT:KINK:VTUB][DISC][MVS][VMCH|MVS]",
+   "[VTUB|DISC:VMCH][VLNG|INT:KINK:VTUB][VALV|INT:VLNG][ACO2|VALV]",
+   "[CCHL|ACO2:ANES:SAO2:TPR][HR|CCHL][CO|HR:STKV][BP|CO:TPR]", sep = "")
> bn.cv(alarm, res, algorithm.args = list(score = "aic"))

  k-fold cross-validation for Bayesian networks

  target network structure:
   [FIO2][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][ERLO][ERCA]
   [HIST|LVF][TPR|APL][PAP|PMB][LVV|HYP:LVF][STKV|HYP:LVF][SHNT|PMB:INT]
   [VMCH|MVS][CVP|LVV][PCWP|LVV][VTUB|DISC:VMCH][PRSS|INT:KINK:VTUB]
   [VLNG|INT:KINK:VTUB][MINV|INT:VLNG][VALV|INT:VLNG][PVS|FIO2:VALV]
   [ACO2|VALV][SAO2|SHNT:PVS][ECO2|ACO2:VLNG][CCHL|TPR:SAO2:ANES:ACO2]
   [HR|CCHL][CO|STKV:HR][HRBP|ERLO:HR][HREK|HR:ERCA][HRSA|HR:ERCA]
   [BP|TPR:CO]
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss (disc.) 
  expected loss:                         10.81851
> bn.cv(alarm, empty.graph(names(alarm)))

  k-fold cross-validation for Bayesian networks

  target network structure:
   [CVP][PCWP][HIST][TPR][BP][CO][HRBP][HREK][HRSA][PAP][SAO2][FIO2][PRSS]
   [ECO2][MINV][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][LVV][STKV]
   [CCHL][ERLO][HR][ERCA][SHNT][PVS][ACO2][VALV][VLNG][VTUB][VMCH]
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss (disc.) 
  expected loss:                         21.468

As expected, the first (true) network is a better fit than the second (empty network). The same is true for the following structure, which generated at random and has no relationship with the data.

> rand = random.graph(names(alarm))
> rand

  Random/Generated Bayesian network

  model:
   [CVP][PCWP][TPR][BP][CO][HRBP][HREK][PAP][PRSS][ECO2][MINV][LVF][APL]
   [ANES][KINK][HR][ERCA][HIST|PCWP][SAO2|HREK][PMB|PCWP][INT|PAP]
   [LVV|BP:ECO2:KINK][PVS|CO:HREK:LVF][VALV|ANES][VLNG|HRBP]
   [HRSA|PCWP:HIST:TPR][FIO2|HIST:PAP][DISC|HIST:APL][ERLO|SAO2:PMB]
   [VTUB|CVP:KINK:LVV][MVS|TPR:FIO2][STKV|PRSS:KINK:DISC][CCHL|DISC]
   [SHNT|HRSA][VMCH|PCWP:HRSA][HYP|FIO2:MVS][ACO2|HYP]
  nodes:                                 37 
  arcs:                                  36 
    undirected arcs:                     0 
    directed arcs:                       36 
  average markov blanket size:           2.86 
  average neighbourhood size:            1.95 
  average branching factor:              0.97 

  generation algorithm:                  Full Ordering 
  arc sampling probability:              0.05555556
> bn.cv(alarm, bn = rand)

  k-fold cross-validation for Bayesian networks

  target network structure:
   [CVP][PCWP][TPR][BP][CO][HRBP][HREK][PAP][PRSS][ECO2][MINV][LVF][APL]
   [ANES][KINK][HR][ERCA][HIST|PCWP][SAO2|HREK][PMB|PCWP][INT|PAP]
   [LVV|BP:ECO2:KINK][PVS|CO:HREK:LVF][VALV|ANES][VLNG|HRBP]
   [HRSA|PCWP:HIST:TPR][FIO2|HIST:PAP][DISC|HIST:APL][ERLO|SAO2:PMB]
   [VTUB|CVP:KINK:LVV][MVS|TPR:FIO2][STKV|PRSS:KINK:DISC][CCHL|DISC]
   [SHNT|HRSA][VMCH|PCWP:HRSA][HYP|FIO2:MVS][ACO2|HYP]
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss (disc.) 
  expected loss:                         21.24157

Cross-validation and predictive error

Cross-validation is also commonly used in classification problems to compute the prediction error of a particular model. We can do that with bn.cv() by specifying

  • loss = "pred" to make frequentist predictions with predict(..., method = "parents"); or
  • loss = "pred-lw" to make Bayesian predictions with predict(..., method = "bayes-lw");

and passing the label of the target node to the loss function via the loss.args argument. The first predicts the target node from its parents; the second from all the available nodes or from a subset of nodes specified in loss.args list as element called from.

> xval = bn.cv(alarm, bn = "hc", loss = "pred", loss.args = list(target = "STKV"))
> xval

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         Classification Error 
  training node:                         STKV 
  expected loss:                         0.1745

We may also want to compute other statistics from that run of cross-validation, like sensitivity and specificity, or just to tabulate observed and predicted values. To that end, we can extract the observed and predicted values for each fold from xval as follows.

> OBS = unlist(lapply(xval, `[[`, "observed"))
> PRED = unlist(lapply(xval, `[[`, "predicted"))
> str(OBS)
 Factor w/ 3 levels "HIGH","LOW","NORMAL": 3 3 3 1 3 3 3 2 2 3 ...
> str(PRED)
 Factor w/ 3 levels "HIGH","LOW","NORMAL": 3 3 3 3 3 3 3 3 3 3 ...
> table(OBS, PRED)
        PRED
OBS       HIGH   LOW NORMAL
  HIGH       0     7    826
  LOW        0   950   2624
  NORMAL     0    33  15560

Cross-validation and predictive correlation

Predictive correlation and mean square error are used to evaluate Gaussian Bayesian networks in the same way as prediction error is used to evaluate discrete Bayesian networks. bn.cv() implements:

  1. loss = "cor" to compute the correlation between observed values and frequentist predictions from predict(..., method = "parents");
  2. loss = "cor-lw" to compute the correlation between observed values and frequentist predictions from predict(..., method = "bayes-lw");
  3. loss = "mse" to compute the correlation between observed values and frequentist predictions from predict(..., method = "parents");
  4. loss = "mse-lw" to compute the correlation between observed values and frequentist predictions from predict(..., method = "bayes-lw").
> bn.cv(gaussian.test, bn = "hc", loss = "cor", loss.args = list(target = "C"))

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         Predictive Correlation 
  training node:                         C 
  expected loss:                         0.9967891
> bn.cv(gaussian.test, bn = "hc", loss = "mse-lw", loss.args = list(target = "C"))

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         
                                      Mean Squared Error (Posterior, Gauss.) 
  training node:                         C 
  expected loss:                         0.2600608
Last updated on Sun Nov 5 19:42:02 2017 with bnlearn 4.3-20171001 and R version 3.0.2 (2013-09-25).