Interfacing with the parallel R package

The parallel package provide a multi-platform implementation of the master-slave parallel programming model, and are the de facto standard way of doing parallel computing in R. Since most tasks in the application of Bayesian networks are computationally intensive, many functions in bnlearn have a cluster argument that takes the cluster returned by parallel's makeCluster() and split their computation among the available slave processes.

Parallel structure learning

All constraint-based structure learning functions can be parallelised following Scutari (2017); Markov blankets and neighbourhoods of different nodes can be learned independently from each other, and later merged and reconciled to produce a coherent Bayesian network. First, we need to load the parallel package and initialise the cluster of slave processes, called cl below.

> library(parallel)
> library(bnlearn)
> cl = makeCluster(2)

Note that makeCluster() can create various types of clusters; they use different inter-process communication mechanisms to pass R code and objects between the master process and the slaves, and some may be more appropriate than others depending on the hardware. In parallel, random number generators in the slaves are properly initialised to give independent streams of random numbers.

> rand = clusterEvalQ(cl, runif(10))
> rand
[[1]]
 [1] 0.002436811 0.049513860 0.473682735 0.263340628 0.340408402 0.725151622
 [7] 0.809882759 0.655275954 0.129688345 0.551917126

[[2]]
 [1] 0.16864613 0.45673024 0.86247286 0.43624297 0.55656647 0.01948956
 [7] 0.63692326 0.15045371 0.95253830 0.28971345
> cor(rand[[1]], rand[[2]])
[1] -0.2071673

Otherwise, it is always possible to set a random seed with clusterSetRNGStream() to get reproducible results across the whole cluster.

> clusterSetRNGStream(cl, 42)

After the cluster is set up, performing parallel structure learning just means passing the cl object to any structure learning algorithm that takes a cluster argument, such as the si.hiton.pc() function.

> data(learning.test)
> pdag = si.hiton.pc(learning.test, cluster = cl)
> clusterEvalQ(cl, test.counter())
[[1]]
[1] 28

[[2]]
[1] 20
> (test.counter())
[1] 0

Using test.counter(), we find that each slave executed about half of the conditional independence tests, and the master process executed just a few.

Finally, we may want to stop the cluster and kill the slave processes when we are done.

> stopCluster(cl)

Parallel parameter learning

Parameter learning is embarrassingly parallel by construction, since each local distribution is independent from the others. Again, we just need to pass cl to bn.fit() to assign the computations to the slave processes.

> dag = cextend(pdag)
> fitted = bn.fit(dag, learning.test, cluster = cl)
> fitted$B

  Parameters of node B (multinomial distribution)

Conditional probability table:
 
   E
B            a          b          c
  a 0.73364245 0.46483590 0.15581098
  b 0.07521896 0.10113865 0.17305236
  c 0.19113859 0.43402545 0.67113665

Parallel cross-validation

It is also possible to parallelise structure learning. When learning a Bayesian network under k cross-validation, we are in fact learning k networks independently from different subsets of the data (identified by the fold we are withholding to use for validation). Hence we can meaningfully use a number of slaves up to the number of folds and learn the networks in parallel passing cl to bn.cv().

> bn.cv(learning.test, "hc", k = 10, cluster = cl)

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss (disc.) 
  expected loss:                         4.775594

This is also true for other kinds of cross-validation, like hold-out and custom-folds.

Parallel bootstrap

Another classic technique that is embarrassingly parallel is the bootstrap; each bootstrap sample is created independently from the original sample and then the statistic of interest is estimated independently from each bootstrap sample. This being the case, both bn.boot() and boot.strength() have cluster argument to perform computations in parallel.

For instance, we can use bn.boot() to compute the number of arcs (narcs) of the networks learned with a particular structure learning algorithm (hc()) and a sample size of 100.

> A = bn.boot(learning.test, algorithm = "hc", cluster = cl,
+       statistic = narcs, R = 10, m = 100)
> unlist(A)
 [1] 4 4 4 4 5 3 4 5 5 4

Or we can use boot.strength() to compute the confidence in the presence and direction of each arc.

> confidence = boot.strength(learning.test, algorithm = "hc", cluster = cl)
> confidence[(confidence$strength > 0.50) & (confidence$direction >= 0.50), ]
   from to strength direction
1     A  B     1.00 0.5050000
3     A  D     1.00 0.8625000
9     B  E     1.00 0.9450000
13    C  D     0.99 0.8712121
30    F  E     1.00 0.9450000

Parallel approximate inference

Approximate inference techniques for Bayesian networks are typically embarrassingly parallel, since they are built on particles-based simulation methods that use independent particles. This makes it possible to generate particles in parallel in the slaves, to compute an estimate of the desired (conditional) probability in each slave, and then to return their average as the overall estimated value.

Below we do this with cpquery() for both logic sampling and likelihood weighting.

> cpquery(fitted, event = (A == "a") & (B == "b"),
+                 evidence = (F == "a"), method = "ls", cluster = cl)
[1] 0.008304912
> cpquery(fitted, event = (A == "a") & (B == "b"),
+                 evidence = list(F = "a"), method = "lw", cluster = cl)
[1] 0.009906337

In the same spirit, cpdist() can also generate random samples in parallel.

> cpdist(fitted, nodes = "A", evidence = (F == "a"), n = 10,
+   method = "ls", cluster = cl)$A
[1] c a c a
Levels: a b c
> cpdist(fitted, nodes = "A", evidence = list(F = "a"), n = 10,
+   method = "lw", cluster = cl)$A
 [1] c a b b a c c b a c
Levels: a b c

Parallel prediction and imputation

Different observations are usually taken to be independent, and therefore can be processed in parallel by both predict() and impute(): the predicted or imputed values in one observation are not influenced by the predictors or the observed values in a different one. For prediction:

> training.set = learning.test[1:4000, ]
> validation.set = learning.test[4001:4050, ]
> fitted = bn.fit(dag, training.set)
> predict(fitted, node = "F", data = validation.set, cluster = cl)
 [1] a a a b a b a a b a b b a a b b a a a a b a b a b b b a a a a b a a a b b b
[39] a a a a a a a b a a b b
Levels: a b

Similarly, for imputation:

> incomplete = learning.test[1:500, ]
> incomplete[1:100, "A"] = NA
> incomplete[101:200, "B"] = NA
> incomplete[201:250, "C"] = NA
> incomplete[251:300, "D"] = NA
> incomplete[301:350, "E"] = NA
> incomplete[351:400, "F"] = NA
> head(impute(fitted, data = incomplete, cluster = cl))
  A B C D E F
1 a c b a b b
2 a a c a b b
3 a a a a a a
4 a a a a b b
5 a a b c a a
6 c c a c c a
Last updated on Sat Feb 17 23:29:39 2024 with bnlearn 5.0-20240208 and R version 4.3.2 (2023-10-31).