## Boostrap-based inference

### The general case

A general-purpose bootstrap implementation, similar in scope to the `boot()`

function
in package **boot**, is provided by the `bn.boot()`

function (documented
here). `bn.boot()`

takes a data set, a
structure learning algorithm and an arbitrary function (whose first argument must be an object of
class `bn`

) and returns a list of the values returned by said function for the network
structures learned from the bootstrapped samples.

For example, we may want to know how many arcs we can expect in a network learned with hill
climbing from the `learning.test`

data set (documented
here). We can do it as follows:

> library(bnlearn)

> unlist(bn.boot(learning.test, statistic = narcs, + algorithm = "hc", R = 10))

[1] 5 5 5 5 5 5 5 5 5 5

The `R`

argument controls how many bootstrap replicates are performed. Or maybe we
want to compare the computational complexity (measured with the numbers of test/score comparisons)
between hill climbing and Grow-Shrink for a sample size of 500:

> unlist(bn.boot(learning.test, statistic = ntests, + algorithm = "hc", R = 10))

[1] 40 40 40 40 40 40 40 40 40 40

> unlist(bn.boot(learning.test, statistic = ntests, + algorithm = "gs", R = 10))

[1] 104 102 101 105 106 112 104 103 108 103

Many other questions can be answered with this approach; essentially any function of the network
structure can be used for the `statistic`

argument. We can also return the structures
themselves using a dummy function as follows.

> bn.boot(learning.test, statistic = function(x) x, + algorithm = "hc", R = 2)

[[1]] Bayesian network learned via Score-based methods model: [A][C][F][B|A][D|A:C][E|B:F] nodes: 6 arcs: 5 undirected arcs: 0 directed arcs: 5 average markov blanket size: 2.33 average neighbourhood size: 1.67 average branching factor: 0.83 learning algorithm: Hill-Climbing score: BIC (disc.) penalization coefficient: 4.258597 tests used in the learning procedure: 40 optimized: TRUE [[2]] Bayesian network learned via Score-based methods model: [A][C][F][B|A][D|A:C][E|B:F] nodes: 6 arcs: 5 undirected arcs: 0 directed arcs: 5 average markov blanket size: 2.33 average neighbourhood size: 1.67 average branching factor: 0.83 learning algorithm: Hill-Climbing score: BIC (disc.) penalization coefficient: 4.258597 tests used in the learning procedure: 40 optimized: TRUE

In addition, we can control how many observations are included in each bootstrap sample with the
`m`

argument.

> unlist(bn.boot(learning.test, statistic = narcs, + algorithm = "hc", R = 10, m = 50))

[1] 2 4 2 3 4 3 4 5 3 4

### Measuring arc strength

Measuring the degree of *confidence* in a particular graphical feature of a Bayesian
network is a key problem in the inference on the network structure. In the case of single arcs this
quantity is called *arc strength*.

Friedman, Goldszmidt and Wyner (1999)
introduced a very simple way of quantifying such a confidence: generating multiple network structures
by applying nonparametric bootstrap to the data and estimating the relative frequency of the feature
of interest. `boot.strength()`

uses this approach to compute the strength of every possible
arc, and has a syntax similar to that of `bn.boot()`

.

> library(bnlearn) > boot.strength(learning.test, algorithm = "hc")

from to strength direction 1 A B 1.00 0.500 2 A C 0.00 0.000 3 A D 1.00 1.000 4 A E 0.00 0.000 5 A F 0.00 0.000 6 B A 1.00 0.500 7 B C 0.01 0.500 8 B D 0.00 0.000 9 B E 1.00 0.995 10 B F 0.01 0.500 11 C A 0.00 0.000 12 C B 0.01 0.500 13 C D 1.00 1.000 14 C E 0.00 0.000 15 C F 0.01 0.500 16 D A 1.00 0.000 17 D B 0.00 0.000 18 D C 1.00 0.000 19 D E 0.00 0.000 20 D F 0.00 0.000 21 E A 0.00 0.000 22 E B 1.00 0.005 23 E C 0.00 0.000 24 E D 0.00 0.000 25 E F 1.00 0.005 26 F A 0.00 0.000 27 F B 0.01 0.500 28 F C 0.01 0.500 29 F D 0.00 0.000 30 F E 1.00 0.995

Note that this approach computes the joint strength of all the possible arcs; the estimates will
not be independent. For each pair of nodes, the probability that there is an arc between them
regardless of its direction is stored in the `direction`

column, and the probability of
each direction is in the `direction`

column. This parameterization follows
Imoto et al. (2002).

`Sun Nov 5 19:47:03 2017`

with **bnlearn**

`4.3-20171001`

and `R version 3.0.2 (2013-09-25)`

.