## Creating Bayesian network structures

The graph structure of a Bayesian network is stored in an object of class `bn`

(documented here). We can create such an
object in various ways through three possible representations: the *arc set* of the
graph, its *adjacency matrix* or a *model formula*. In addition, we can also
generate *empty* and *random* network structures with the
`empty.graph()`

and `random.graph()`

functions (both documented
here).

### Creating an empty network

> library(bnlearn)

> e = empty.graph(LETTERS[1:6]) > class(e)

[1] "bn"

> e

Random/Generated Bayesian network model: [A][B][C][D][E][F] nodes: 6 arcs: 0 undirected arcs: 0 directed arcs: 0 average markov blanket size: 0.00 average neighbourhood size: 0.00 average branching factor: 0.00 generation algorithm: Empty

Note that by default `empty.graph()`

returns a single object of class
`bn`

. We can generate multiple graphs in a single batch by specifying the
`num`

argument of `empty.graph()`

; in that case the graphs will be stored
in a list of `bn`

objects.

> empty.graph(LETTERS[1:6], num = 2)

[[1]] Random/Generated Bayesian network model: [A][B][C][D][E][F] nodes: 6 arcs: 0 undirected arcs: 0 directed arcs: 0 average markov blanket size: 0.00 average neighbourhood size: 0.00 average branching factor: 0.00 generation algorithm: Empty [[2]] Random/Generated Bayesian network model: [A][B][C][D][E][F] nodes: 6 arcs: 0 undirected arcs: 0 directed arcs: 0 average markov blanket size: 0.00 average neighbourhood size: 0.00 average branching factor: 0.00 generation algorithm: Empty

### Creating a network structure

#### With a specific arc set

First, we create an arc set in the form of a matrix with two columns, optionally labelled
`from`

and `to`

; each row describes one arc using the labels of the nodes
at the tail (`from`

) and at the head (`to`

) of the arc.

> arc.set = matrix(c("A", "C", "B", "F", "C", "F"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > arc.set

from to [1,] "A" "C" [2,] "B" "F" [3,] "C" "F"

Then we can assign the newly created `arc.set`

to a `bn`

object using the
assignment version of the `arcs()`

function (documented
here).

> arcs(e) = arc.set > e

Random/Generated Bayesian network model: [A][B][D][E][C|A][F|B:C] nodes: 6 arcs: 3 undirected arcs: 0 directed arcs: 3 average markov blanket size: 1.33 average neighbourhood size: 1.00 average branching factor: 0.50 generation algorithm: Empty

Note that by default `arcs()`

performs several checks on the arc set, so:

- We cannot introduce arcs with node labels that are not present in the graph.
> bogus = matrix(c("X", "Y", "W", "Z"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > bogus

from to [1,] "X" "Y" [2,] "W" "Z"

> arcs(e) = bogus

## Error: node(s) 'X' 'W' 'Y' 'Z' not present in the graph.

- We cannot introduce cycles by mistake (
*e.g.*`A`

→`B`

→`C`

→`A`

) unless we set the`ignore.cycles`

to`TRUE`

.> cycle = matrix(c("A", "C", "C", "B", "B", "A"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > cycle

from to [1,] "A" "C" [2,] "C" "B" [3,] "B" "A"

> arcs(e) = cycle

## Error: the specified network contains cycles.

> arcs(e, ignore.cycles = TRUE) = cycle

## Error in `arcs<-`(`*tmp*`, ignore.cycles = TRUE, value = structure(c("A", : unused argument (ignore.cycles = TRUE)

> acyclic(e)

[1] TRUE

- We cannot introduce loops by mistake (
*e.g.*`A`

→`A`

).> loops = matrix(c("A", "A", "B", "B", "C", "D"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > loops

from to [1,] "A" "A" [2,] "B" "B" [3,] "C" "D"

> arcs(e) = loops

## Error: invalid arcs that are actually loops: ## A -> A ## B -> B

- We can, however, introduce undirected arcs (
*i.e.*edges) by including both directions of an arc in the arc set (*e.g.*`A`

→`B`

and`B`

→`A`

) as long as it is clear no cycle would be introduced by either direction. Again, specifying`ignore.cycles`

overrides this check.> edges = matrix(c("A", "B", "B", "A", "C", "D"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > edges

from to [1,] "A" "B" [2,] "B" "A" [3,] "C" "D"

> arcs(e) = edges

#### With a specific adjacency matrix

We can create a graph structure using an adjaceny matrix with `amat()`

(documented
here) in the same way as we did above using an arc set
with `arcs()`

. First, we create an adjacency matrix with 0/1 integer elements (note the
use of `0L`

and `1L`

) with the 1s corresponding to the arcs. Using a matrix
with numeric, non-integer entries (*e.g.* `0`

and `1`

instead of
`0L`

and `1L`

) works as well but uses more memory, which may be critical in
large graphs.

> adj = matrix(0L, ncol = 6, nrow = 6, + dimnames = list(LETTERS[1:6], LETTERS[1:6])) > adj["A", "C"] = 1L > adj["B", "F"] = 1L > adj["C", "F"] = 1L > adj["D", "E"] = 1L > adj["A", "E"] = 1L > adj

A B C D E F A 0 0 1 0 1 0 B 0 0 0 0 0 1 C 0 0 0 0 0 1 D 0 0 0 0 1 0 E 0 0 0 0 0 0 F 0 0 0 0 0 0

Then we introduce the arcs in the graph with the assignment version of `amat()`

.

> amat(e) = adj > e

Random/Generated Bayesian network model: [A][B][D][C|A][E|A:D][F|B:C] nodes: 6 arcs: 5 undirected arcs: 0 directed arcs: 5 average markov blanket size: 2.33 average neighbourhood size: 1.67 average branching factor: 0.83 generation algorithm: Empty

Note that `amat()`

performs the same checks as `arcs()`

to prevent the
creation of misspecified graph structures.

#### With a specific model formula

The format of the model formula is originally from the **deal** package
(CRAN), and is defined as follows:

- each node is enclosed in square brackets; and
- its parents are listed after a pipe and separated by colons, also within the same set of square brackets.

Such a formula can be used in two ways to define a graph structure:

- With the
`model2network()`

function (documented here), without creating an empty graph first.> model2network("[A][C][B|A][D|C][F|A:B:C][E|F]")

Random/Generated Bayesian network model: [A][C][B|A][D|C][F|A:B:C][E|F] nodes: 6 arcs: 6 undirected arcs: 0 directed arcs: 6 average markov blanket size: 2.67 average neighbourhood size: 2.00 average branching factor: 1.00 generation algorithm: Empty

- With
`modelstring()`

(documented here) and an existing`bn`

object, as for`arcs()`

and`amat()`

above.> modelstring(e) = "[A][C][B|A][D|C][F|A:B:C][E|F]" > e

Random/Generated Bayesian network model: [A][C][B|A][D|C][F|A:B:C][E|F] nodes: 6 arcs: 6 undirected arcs: 0 directed arcs: 6 average markov blanket size: 2.67 average neighbourhood size: 2.00 average branching factor: 1.00 generation algorithm: Empty

Again, note that both `model2network()`

and `modelstring()`

perform checks
on the model formula to prevent the creation of misspecified graphs.

### Creating one or more random network structures

#### With a specified node ordering

By default, `random.graph()`

generates graphs consistent with the ordering of the
nodes provided by the user; in each arc, the node at the tail precedes the node at the head in the
node set. Arcs are samples independently with a probability of inclusion specified by the
`prob`

argument, whose default is set so that there are on average as many arcs as nodes
in the graph.

> random.graph(LETTERS[1:6], prob = 0.1)

Random/Generated Bayesian network model: [A][B][C][E][F][D|A] nodes: 6 arcs: 1 undirected arcs: 0 directed arcs: 1 average markov blanket size: 0.33 average neighbourhood size: 0.33 average branching factor: 0.17 generation algorithm: Full Ordering arc sampling probability: 0.1

As was the case for `empty.graph()`

, we can generate multiple graphs in one batch
using the `num`

argument. In that case, graphs are returned in a list which is very
convenient to pass to `lapply()`

for further computations.

#### Sampling from the space of connected directed acyclic graphs with uniform probability

In addition to the default `method = "ordered"`

, we can also set
`method = "ic-dag"`

to sample with uniform probability from the space of connected
directed acyclic graphs using Ide & Cozman MCMC sampler. Note that the suggested length of
the burn-in scales quadratically with the number of nodes, so generating multiple graphs in a
single batch with `num`

is much more efficient than generating them one at a time.

> random.graph(LETTERS[1:6], num = 2, method = "ic-dag")

[[1]] Random/Generated Bayesian network model: [C][F][A|C][E|A:C][D|C:E][B|A:C:D:F] nodes: 6 arcs: 9 undirected arcs: 0 directed arcs: 9 average markov blanket size: 4.33 average neighbourhood size: 3.00 average branching factor: 1.50 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 216 maximum in-degree: Inf maximum out-degree: Inf maximum degree: Inf [[2]] Random/Generated Bayesian network model: [C][F][A|C][E|A:C][D|C:E:F][B|A:C:D:F] nodes: 6 arcs: 10 undirected arcs: 0 directed arcs: 10 average markov blanket size: 4.67 average neighbourhood size: 3.33 average branching factor: 1.67 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 216 maximum in-degree: Inf maximum out-degree: Inf maximum degree: Inf

This methods accepts several optional constraints on the structure of the generated graphs, as specified by the following arguments:

`burn.in`

: the number of iterations for the algorithm to converge to a stationary (and uniform) probability distribution.`every`

: return only one graph every number of steps instead of all the generated graphs. High values of every result in a more diverse set.`max.degree`

: the maximum degree of a node.`max.in.degree`

: the maximum in-degree.`max.out.degree`

: the maximum out-degree.

So, for example:

> random.graph(LETTERS[1:6], num = 2, method = "ic-dag", burn.in = 10^4, + every = 50, max.degree = 3)

[[1]] Random/Generated Bayesian network model: [F][D|F][A|D:F][B|D][C|A:F][E|B] nodes: 6 arcs: 7 undirected arcs: 0 directed arcs: 7 average markov blanket size: 2.33 average neighbourhood size: 2.33 average branching factor: 1.17 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 10000 maximum in-degree: Inf maximum out-degree: Inf maximum degree: 3 [[2]] Random/Generated Bayesian network model: [D][E][F|E][A|D:F][C|A:F][B|C] nodes: 6 arcs: 6 undirected arcs: 0 directed arcs: 6 average markov blanket size: 2.33 average neighbourhood size: 2.00 average branching factor: 1.00 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 10000 maximum in-degree: Inf maximum out-degree: Inf maximum degree: 3

#### Sampling from the space of the directed acyclic graphs with uniform probability

Melançon's MCMC algorithm samples with uniform probability from the space of
directed acyclic graphs (not necessarily connected); we can use it by specifying
`method = "melancon"`

.

> random.graph(LETTERS[1:6], method = "melancon")

Random/Generated Bayesian network model: [D][F][B|D:F][C|D:F][A|B:D][E|A:B:F] nodes: 6 arcs: 9 undirected arcs: 0 directed arcs: 9 average markov blanket size: 3.67 average neighbourhood size: 3.00 average branching factor: 1.50 generation algorithm: Melancon's Uniform Probability DAGs burn in length: 216 maximum in-degree: Inf maximum out-degree: Inf maximum degree: Inf

As far as optional arguments are concerned, it is identical to Ide & Cozman's algorithm.

`Sun Nov 5 20:54:10 2017`

with **bnlearn**

`4.3-20171001`

and `R version 3.0.2 (2013-09-25)`

.