## Bayesian networks and cross-validation

### Choosing a Bayesian network learning strategy

Cross-validation is a standard way to obtain unbiased estimates of a model's goodness of fit. By comparing such estimates for different learning strategies (different combinations of learning algorithms, fitting techniques and the respective parameters) we can choose the optimal one for the data at hand in a principled way.

**bnlearn** implements three cross-validation methods in the `bn.cv()`

function (documented here):

*k-fold cross-validation*(the default): the data are randomly partitioned into*k*subsets. Each subset is used in turn to validate the model fitted on the remaining*k - 1*subsets.*Custom folds cross-validation*: the data are manually partitioned into subsets by the user into subsets, which are then used as in*k*-fold cross-validation. Subsets are not constrained to have the same size.*Hold-out cross-validation*: the data are repeatedly (randomly) partitioned in a training and test subset of given size*m*and*n - m*, with observations assigned randomly to each subset at each repetition. Each test subset is used to validate the model fitted on the corresponding training subset.

`bn.cv()`

requires the following parameters for all methods:

`method`

: the label of the cross-validation method, either`"k-fold"`

(the default),`"custom-fold"`

or`"hold-out"`

;`data`

: the data set the Bayesian network will be estimated from;`bn`

: the structure learning algorithm to be used, or a fixed network structure (to cross-validate just parameter learning);`loss:`

: the loss function used to evaluate the goodness of fit of the learned network;`fit`

: the fitting method (such as`mle`

or`bayes`

) to be used to fit the parameters.

Additional parameters for all these components can be specified as well, providing a fine-grained
control of the learning strategy, using the following arguments of `bn.cv()`

:

`algorithm.args`

for additional arguments to be passed to the learning algorithm;`loss.args`

for additional arguments to be passed to the loss function;`fit.args`

for additional arguments to be passed to`bn.fit()`

for fitting the parameters of the Bayesian network.

*k*-fold cross-validation

For example, let's compare the performance of three different scores in a hill climbing search
on the ALARM data set included in
**bnlearn**. The loss funtion we use is the so-called *log-likelihood loss*,
which is the negated expected loglikelihood; so lower values are better. Hill cimbing is implemented
in the `hc()`

function.

> library(bnlearn)

> data(alarm) > bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bde", iss = 1))

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 10.8272

In this case the `bn`

argument specifies the label of a structure learning algorithm,
and `algorithm.args`

is an optional list of arguments for the algorithm specified by
`bn`

.

> bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bde", iss = 10))

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 10.82355

> bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bic"))

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 10.86796

Partially directed graphs obtained from learning are automatically extended to directed acyclic
graphs with `cextend`

, so that the parameters can be fit and the loss function computed.

> bn.cv(alarm, bn = "iamb")

k-fold cross-validation for Bayesian networks target learning algorithm: IAMB number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 13.93915

Tipically, 10-fold cross-validation is performed 10 times for each learning strategy (golden
standard originally introduced by Hastie, Tibshirani and Friedman in «The Elements of Statistical
Learning», link),
and the resulting sets of loss values compared by plotting them as boxplots. Below we do so that
for the BIC and BDe scores using `hc()`

.

> cv.bic = bn.cv(alarm, bn = "hc", runs = 10, + algorithm.args = list(score = "bic")) > cv.bde = bn.cv(alarm, bn = "hc", runs = 10, + algorithm.args = list(score = "bde", iss = 1)) > plot(cv.bic, cv.bde, xlab = c("BIC", "BDe"))

Note that, while it is customary to use `k = 10`

, a different number of folds can be
specified with the `k`

argument.

> bn.cv(alarm, bn = "hc", k = 5)

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 5 loss function: Log-Likelihood Loss (disc.) expected loss: 10.86357

#### Custom folds in cross-validation

The `custom-folds`

method works in the same way as *k*-fold cross-validation,
but folds are specified manually with a `folds`

argument.

> folds = list(1:5000, 5001:15000, 15001:20000) > cv.custom = bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bic"), + method = "custom-folds", folds = folds) > cv.custom

custom-folds cross-validation for Bayesian networks target learning algorithm: Hill-Climbing loss function: Log-Likelihood Loss (disc.) expected loss: 10.85651

The two methods are otherwise identical, as shown below: the `R`

objects returned by
`bn.cv()`

are identical (with the exception of the method's label) if the same folds are
used.

> cv.random = bn.cv(alarm, bn = "hc", method = "k-fold") > folds = lapply(cv.random, `[[`, "test") > cv.custom = bn.cv(alarm, bn = "hc", method = "custom-folds", + folds = folds) > all.equal(cv.random, cv.custom, check.attributes = FALSE)

[1] TRUE

Note that:

- Each observation can only be included in a single fold.
> folds = list(1:5000, 4000:10000, 10001:20000) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)

## Error in check.single.run(folds = folds, n = n): some observations are included in more than one fold.

- Each fold must contain at least one observation.
> folds = list(numeric(0), 1:2000, 2001:20000) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)

## Error in check.single.run(folds = folds, n = n): some folds contain no observations.

- All observations must be assigned to the folds.
> folds = list(1:1000, 2000:20000) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)

## Error in check.single.run(folds = folds, n = n): not all observations are assigned to a fold.

`runs`

is not a valid argument for this cross-validation method, because the ability of performing multiple runs relies on splitting the data randomly into folds.

To perform multiple runs, each set of manually specified folds (one for each `run`

)
should be saved in a list.

> folds = list( + list(1:5000, 5001:10000, 10001:20000), + list(1:10000, 10001:15000, 15001:20000), + list(1:5000, 5001:15000, 15001:20000) + ) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)

custom-folds cross-validation for Bayesian networks target learning algorithm: Hill-Climbing loss function: Log-Likelihood Loss (disc.) number of runs: 3 average loss over the runs: 10.86871 standard deviation of the loss: 0.01063207

The following function can be used to generate the folds randomly,

> random.folds = function(data, k) split(sample(nrow(data)), seq(k))

for a single run or cross-validation:

> folds = random.folds(data = alarm, k = 10) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)

custom-folds cross-validation for Bayesian networks target learning algorithm: Hill-Climbing loss function: Log-Likelihood Loss (disc.) expected loss: 10.86583

or for multiple runs.

> folds = replicate(10, random.folds(data = alarm, k = 10), simplify = FALSE) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)

custom-folds cross-validation for Bayesian networks target learning algorithm: Hill-Climbing loss function: Log-Likelihood Loss (disc.) number of runs: 10 average loss over the runs: 10.87163 standard deviation of the loss: 0.003236653

#### Hold-out cross-validation

The syntax for hold-out cross-validation is very similar to that for *k*-fold
cross-validation, but:

`method`

must be set to "`hold-out`

";`k`

denotes the number of times the sample will be split into training and test subsamples (the default is 10);- and optionally
`m`

denotes the number of observations to be sampled for the test subsample (the default is 1/10th of the sample size).

So, for instance:

> bn.cv(alarm, bn = "hc", method = "hold-out", k = 25, m = 200, + algorithm.args = list(score = "bde", iss = 10))

hold-out cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of splits: 25 size of the test subset: 200 loss function: Log-Likelihood Loss (disc.) expected loss: 10.73383

### Comparing different network structures

`bn.cv()`

can also be used to compare the goodness of fit of different network
structures suggested by expert knowledge; using all the data to fit the parameters of those networks
and to evaluate their goodness of fit produces optimistically biased results.

Let's start with the most obvious comparison: the true network structure against the empty one.

> res = empty.graph(names(alarm)) > modelstring(res) = paste("[HIST|LVF][CVP|LVV][PCWP|LVV][HYP][LVV|HYP:LVF]", + "[LVF][STKV|HYP:LVF][ERLO][HRBP|ERLO:HR][HREK|ERCA:HR][ERCA]", + "[HRSA|ERCA:HR][ANES][APL][TPR|APL][ECO2|ACO2:VLNG][KINK]", + "[MINV|INT:VLNG][FIO2][PVS|FIO2:VALV][SAO2|PVS:SHNT][PAP|PMB][PMB]", + "[SHNT|INT:PMB][INT][PRSS|INT:KINK:VTUB][DISC][MVS][VMCH|MVS]", + "[VTUB|DISC:VMCH][VLNG|INT:KINK:VTUB][VALV|INT:VLNG][ACO2|VALV]", + "[CCHL|ACO2:ANES:SAO2:TPR][HR|CCHL][CO|HR:STKV][BP|CO:TPR]", sep = "") > bn.cv(alarm, res, algorithm.args = list(score = "aic"))

k-fold cross-validation for Bayesian networks target network structure: [FIO2][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][ERLO][ERCA] [HIST|LVF][TPR|APL][PAP|PMB][LVV|HYP:LVF][STKV|HYP:LVF][SHNT|PMB:INT] [VMCH|MVS][CVP|LVV][PCWP|LVV][VTUB|DISC:VMCH][PRSS|INT:KINK:VTUB] [VLNG|INT:KINK:VTUB][MINV|INT:VLNG][VALV|INT:VLNG][PVS|FIO2:VALV] [ACO2|VALV][SAO2|SHNT:PVS][ECO2|ACO2:VLNG][CCHL|TPR:SAO2:ANES:ACO2] [HR|CCHL][CO|STKV:HR][HRBP|ERLO:HR][HREK|HR:ERCA][HRSA|HR:ERCA] [BP|TPR:CO] number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 10.81851

> bn.cv(alarm, empty.graph(names(alarm)))

k-fold cross-validation for Bayesian networks target network structure: [CVP][PCWP][HIST][TPR][BP][CO][HRBP][HREK][HRSA][PAP][SAO2][FIO2][PRSS] [ECO2][MINV][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][LVV][STKV] [CCHL][ERLO][HR][ERCA][SHNT][PVS][ACO2][VALV][VLNG][VTUB][VMCH] number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 21.468

As expected, the first (true) network is a better fit than the second (empty network). The same is true for the following structure, which generated at random and has no relationship with the data.

> rand = random.graph(names(alarm)) > rand

Random/Generated Bayesian network model: [CVP][PCWP][TPR][BP][CO][HRBP][HREK][PAP][PRSS][ECO2][MINV][LVF][APL] [ANES][KINK][HR][ERCA][HIST|PCWP][SAO2|HREK][PMB|PCWP][INT|PAP] [LVV|BP:ECO2:KINK][PVS|CO:HREK:LVF][VALV|ANES][VLNG|HRBP] [HRSA|PCWP:HIST:TPR][FIO2|HIST:PAP][DISC|HIST:APL][ERLO|SAO2:PMB] [VTUB|CVP:KINK:LVV][MVS|TPR:FIO2][STKV|PRSS:KINK:DISC][CCHL|DISC] [SHNT|HRSA][VMCH|PCWP:HRSA][HYP|FIO2:MVS][ACO2|HYP] nodes: 37 arcs: 36 undirected arcs: 0 directed arcs: 36 average markov blanket size: 2.86 average neighbourhood size: 1.95 average branching factor: 0.97 generation algorithm: Full Ordering arc sampling probability: 0.05555556

> bn.cv(alarm, bn = rand)

k-fold cross-validation for Bayesian networks target network structure: [CVP][PCWP][TPR][BP][CO][HRBP][HREK][PAP][PRSS][ECO2][MINV][LVF][APL] [ANES][KINK][HR][ERCA][HIST|PCWP][SAO2|HREK][PMB|PCWP][INT|PAP] [LVV|BP:ECO2:KINK][PVS|CO:HREK:LVF][VALV|ANES][VLNG|HRBP] [HRSA|PCWP:HIST:TPR][FIO2|HIST:PAP][DISC|HIST:APL][ERLO|SAO2:PMB] [VTUB|CVP:KINK:LVV][MVS|TPR:FIO2][STKV|PRSS:KINK:DISC][CCHL|DISC] [SHNT|HRSA][VMCH|PCWP:HRSA][HYP|FIO2:MVS][ACO2|HYP] number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 21.24157

### Cross-validation and predictive error

Cross-validation is also commonly used in classification problems to compute the
*prediction error* of a particular model. We can do that with `bn.cv()`

by
specifying

`loss = "pred"`

to make frequentist predictions with`predict(..., method = "parents")`

; or`loss = "pred-lw"`

to make Bayesian predictions with`predict(..., method = "bayes-lw")`

;

and passing the label of the target node to the loss function via the `loss.args`

argument. The first predicts the target node from its parents; the second from all the available
nodes or from a subset of nodes specified in `loss.args`

list as element called
`from`

.

> xval = bn.cv(alarm, bn = "hc", loss = "pred", loss.args = list(target = "STKV")) > xval

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Classification Error training node: STKV expected loss: 0.1745

We may also want to compute other statistics from that run of cross-validation, like sensitivity
and specificity, or just to tabulate observed and predicted values. To that end, we can extract the
observed and predicted values for each fold from `xval`

as follows.

> OBS = unlist(lapply(xval, `[[`, "observed")) > PRED = unlist(lapply(xval, `[[`, "predicted")) > str(OBS)

Factor w/ 3 levels "HIGH","LOW","NORMAL": 3 3 3 1 3 3 3 2 2 3 ...

> str(PRED)

Factor w/ 3 levels "HIGH","LOW","NORMAL": 3 3 3 3 3 3 3 3 3 3 ...

> table(OBS, PRED)

PRED OBS HIGH LOW NORMAL HIGH 0 7 826 LOW 0 950 2624 NORMAL 0 33 15560

### Cross-validation and predictive correlation

Predictive correlation and mean square error are used to evaluate Gaussian Bayesian networks in
the same way as prediction error is used to evaluate discrete Bayesian networks. `bn.cv()`

implements:

`loss = "cor"`

to compute the correlation between observed values and frequentist predictions from`predict(..., method = "parents")`

;`loss = "cor-lw"`

to compute the correlation between observed values and frequentist predictions from`predict(..., method = "bayes-lw")`

;`loss = "mse"`

to compute the correlation between observed values and frequentist predictions from`predict(..., method = "parents")`

;`loss = "mse-lw"`

to compute the correlation between observed values and frequentist predictions from`predict(..., method = "bayes-lw")`

.

> bn.cv(gaussian.test, bn = "hc", loss = "cor", loss.args = list(target = "C"))

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Predictive Correlation training node: C expected loss: 0.9967891

> bn.cv(gaussian.test, bn = "hc", loss = "mse-lw", loss.args = list(target = "C"))

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Mean Squared Error (Posterior, Gauss.) training node: C expected loss: 0.2600608

`Sun Nov 5 19:42:02 2017`

with **bnlearn**

`4.3-20171001`

and `R version 3.0.2 (2013-09-25)`

.