Comparing Bayesian network structures

The network structures of Bayesian networks are stored in objects of class bn (documented here); they can be learned from real-world data; they can be learned from synthetic data and golden-standard networks in simulations (examples here); or they can be created manually (see here). Having multiple bn objects, we are then interested in comparing those network structures to check whether they are different; and if they are different, which arcs are present (or not) only in some of them and how different they are.

bnlearn implements several functions for this task, all documented here and summarized below:

  • Summaries:
    • all.equal(): checks whether two networks have the same structure.
    • compare(): takes one network as a reference network, and computes the number of true positive, false positive and false negative arcs in the other network.
  • Structural distances:
    • hamming(): computes the Hamming distance between two networks.
    • shd(): computes the Structural Hamming distance (SHD) between two networks.
  • Visual (graphical) side-by-side comparison:
    • graphviz.compare(): takes one network as a reference network, and plots all other networks highlighting which are a different from the reference using Rgraphviz.

Summaries

The simplest, most elementary comparison is to check whether two networks are the same; meaning whether they have the same nodes and same arcs (including their direction). bnlearn provides an all.equal() method for bn objects that returns TRUE if the two networks are indeed the same; or a character string describing the difference. For instance, the networks may have different arcs;

> library(bnlearn)
> dag1 = model2network("[A][B|A][C|A]")
> dag2 = model2network("[A|B:C][B][C]")
> all.equal(dag1, dag2)
[1] "Different arc sets"
> arcs(dag1)
     from to 
[1,] "A"  "B"
[2,] "A"  "C"
> arcs(dag2)
     from to 
[1,] "B"  "A"
[2,] "C"  "A"

or it may have different nodes.

> dag3 = model2network("[A][B|A][C|A][D]")
> dag4 = model2network("[A][B|A][F|A]")
> all.equal(dag1, dag3)
[1] "Different number of nodes"
> all.equal(dag1, dag4)
[1] "Different node sets"

This approach using all.equal() is more robust than using other functions to compare objects available from R core, because the all.equal() method for bn will not say two networks are different just because the internals of the bn objects are different.

compare() takes two networks with the same nodes; the first is the target network, and the second is the current network. The latter will be compared to the former, which is taken to be the “true” or “golden standard” network. Thew default output of compare looks like this:

> compare(dag1, dag2)
$tp
[1] 0

$fp
[1] 2

$fn
[1] 2

The three elements of the list are the counts of:

  • the true positive (tp) arcs, which appear both in target and in current;
  • the false positive (fp) arcs, which appear in current but not in target;
  • the false negative (fn) arcs, which appear in target but not in current.

Setting arcs = TRUE makes compare() return the actual arcs that fall into each of these categories instead of just the counts.

> compare(dag1, dag2, arcs = TRUE)
$tp
     from to

$fp
     from to 
[1,] "B"  "A"
[2,] "C"  "A"

$fn
     from to 
[1,] "A"  "B"
[2,] "A"  "C"

Structural distances

Structural distance functions take two networks and return a number that says how much these networks are different from each other in terms of how many arcs differ between them. Being distances, the order in which the networks are passed as arguments does not matter since distances are symmetric by default.

hamming() implements Hamming distance, which simply counts the number of arcs that are different between two network. For undirected graphs, this means arcs that are present in one network and not in the other; for directed graphs, compare() just ignores arc directions and compares the skeletons of the two networks.

So, for example, dag1 and dag2 above have the same skeleton and therefore their Hamming distance is zero even though they have different arcs.

> all.equal(skeleton(dag1), skeleton(dag2))
[1] TRUE
> hamming(dag1, dag2)
[1] 0

However, if we add an additional arc to dag1 the Hamming distance between this new network and dag1 will reflect this since the two have different skeletons.

> dag5 = set.arc(dag1, from = "B", to = "C")
> all.equal(skeleton(dag1), skeleton(dag5))
[1] "Different number of directed/undirected arcs"
> hamming(dag1, dag5)
[1] 1

shd() implements the Structural Hamming distance, which counts how many arcs differ between the CPDAGs of the two networks. So, if we look at dag1 and dag2 again, these two networks will have a positive distance because dag2 contains a v-structure that dag1.

> all.equal(cpdag(dag1), cpdag(dag2))
[1] "Different number of directed/undirected arcs"
> shd(dag1, dag2)
[1] 2

Visual comparisons

A third approach to compare network structures is to plot them side by side and highlight differences with different colours. graphviz.compare() does that using Rgraphviz while taking care that the nodes have the same position for all networks, to make it easier to spot which arcs are different. As we can see below, and unlike any of the functions we covered above, graphviz.compare() can take more than two networks as arguments.

> dag6 = drop.arc(dag5, from = "A", to = "C")
> par(mfrow = c(3, 2))
> graphviz.compare(dag1, dag2, dag5, dag6)
Loading required namespace: Rgraphviz
plot of chunk unnamed-chunk-9

The default scheme to highlight differences (diff = "diff-from-first") in the arcs is:

  • the first network is taken as the “true” network;
  • in the other networks, true positive arcs are in black;
  • false positive arcs (which are missing or have different directions in the true network) are in red;
  • false negative arcs are in blue, and drawn using a dashed line.

The style of the graphical comparison can be further modified using the arguments that graphviz.compare() shares with graphviz.plot(), such as shape, layout, main and sub.

> par(mfrow = c(2, 2))
> graphviz.compare(dag1, dag2, dag5, dag6, shape = "rectangle", layout = "fdp",
+   main = c("ONE", "TWO", "THREE", "FOUR"),
+   sub = paste("SHD =", c("0", shd(dag1, dag2), shd(dag1, dag5), shd(dag1, dag6))))
plot of chunk unnamed-chunk-10

Other optional arguments specific to diff = "diff-from-first" can be passed as a list via the argument diff.args. They are:

  • tp.col, tp.lty, tp.lwd are the colour, line type and line width of true positive arcs;
  • fp.col, fp.lty, fp.lwd are the colour, line type and line width of false positive arcs;
  • fn.col, fn.lty, fn.lwd are the colour, line type and line width of negative arcs.

An example:

> par(mfrow = c(2, 2))
> graphviz.compare(dag1, dag2, dag5, dag6, shape = "rectangle", layout = "fdp",
+   main = c("ONE", "TWO", "THREE", "FOUR"),
+   sub = paste("SHD =", c("0", shd(dag1, dag2), shd(dag1, dag5), shd(dag1, dag6))),
+   diff.args = list(tp.lwd = 2, tp.col = "green", fn.col = "orange"))
## Error in slot(g@renderInfo, what)[[i]][[j]] <- repl: replacement has length zero
plot of chunk unnamed-chunk-11
Last updated on Sun Jan 28 23:32:22 2018 with bnlearn 4.3 and R version 3.0.2 (2013-09-25).