## Creating custom fitted Bayesian networks

In general, there are three ways of creating a `bn.fit`

object representing a
Bayesian network:

- a
*data-driven*approach, learning it from a data set using`bn.fit()`

and a network structure (in a`bn`

object) as illustrated here; - an
*expert-driven*approach, in which both the network structure and the parameters are specified by the user; - a
*hybrid*approach combining the above.

How the parameters are specified depends on the type of data the Bayesian network encodes (discrete, continuous, a mixture of the two). The relevant functions are:

`custom.fit()`

(manual page), which takes a`bn`

object encoding the network structure and a list with the parameters of the local distributions of the nodes;- the assignment operator for
`bn.fit`

objects, which replaces the parameters of a single local distribution.

### Creating custom fitted Bayesian networks using expert knowledge

#### Discrete networks

For discrete data, the parameters of the local distribution are specified as conditional probability tables. Each column of such a table represents the probability distribution of the node conditional on a particular configuration of levels of the parents.

> library(bnlearn)

> cptA = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH"))) > cptA

LOW HIGH [1,] 0.4 0.6

> cptB = matrix(c(0.8, 0.2), ncol = 2, dimnames = list(NULL, c("GOOD", "BAD"))) > cptB

GOOD BAD [1,] 0.8 0.2

> cptC = c(0.5, 0.5, 0.4, 0.6, 0.3, 0.7, 0.2, 0.8) > dim(cptC) = c(2, 2, 2) > dimnames(cptC) = list("C" = c("TRUE", "FALSE"), "A" = c("LOW", "HIGH"), + "B" = c("GOOD", "BAD")) > cptC

, , B = GOOD A C LOW HIGH TRUE 0.5 0.4 FALSE 0.5 0.6 , , B = BAD A C LOW HIGH TRUE 0.3 0.2 FALSE 0.7 0.8

The tables are passed to `custom.fit()`

as a list whose names are the labels of the
nodes the conditional probability tables will be assigned to.

> net = model2network("[A][B][C|A:B]") > dfit = custom.fit(net, dist = list(A = cptA, B = cptB, C = cptC)) > dfit

Bayesian network parameters Parameters of node A (multinomial distribution) Conditional probability table: LOW HIGH 0.4 0.6 Parameters of node B (multinomial distribution) Conditional probability table: GOOD BAD 0.8 0.2 Parameters of node C (multinomial distribution) Conditional probability table: , , B = GOOD A C LOW HIGH TRUE 0.5 0.4 FALSE 0.5 0.6 , , B = BAD A C LOW HIGH TRUE 0.3 0.2 FALSE 0.7 0.8

If one or more nodes in the network are associated with ordinal variables, their labels should
be passed to `custom.fit()`

with the `ordinal`

argument; all other arguments
are the same as above.

> # for ordinal nodes it is nearly the same. > dfit = custom.fit(net, dist = list(A = cptA, B = cptB, C = cptC), + ordinal = c("A", "B")) > dfit

Bayesian network parameters Parameters of node A (ordinal distribution) Conditional probability table: LOW HIGH 0.4 0.6 Parameters of node B (ordinal distribution) Conditional probability table: GOOD BAD 0.8 0.2 Parameters of node C (multinomial distribution) Conditional probability table: , , B = GOOD A C LOW HIGH TRUE 0.5 0.4 FALSE 0.5 0.6 , , B = BAD A C LOW HIGH TRUE 0.3 0.2 FALSE 0.7 0.8

Note that `custom.fit()`

performs several checks on the local distributions:

- The probabilities should sum up to one in each (conditional) distribution.
> cpt.bad = matrix(c(0.8, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH"))) > custom.fit(net, dist = list(A = cpt.bad, B = cptB, C = cptC))

## Error: the probability distribution of node A does not sum to one.

- The dimension names must be consistent between different local distributions (the names of
the levels of the parents are considered authoritative, so the error messages mention the child
node).
> cpt.bad = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("BOGUS", "WRONG"))) > custom.fit(net, dist = list(A = cpt.bad, B = cptB, C = cptC))

## Error: the levels of the parent A of node C do not match.

- The dimensions must also be consistent across different local distributions.
> cpt.bad = matrix(c(0.4, 0.5, 0.1), ncol = 3, dimnames = list(NULL, c("LOW", "AVERAGE", "HIGH"))) > custom.fit(net, dist = list(A = cpt.bad, B = cptB, C = cptC))

## Error: the levels of the parent A of node C do not match.

- The probabilities are checked for missing values.
> cpt.bad = matrix(c(0.4, NaN), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH"))) > custom.fit(net, dist = list(A = cpt.bad, B = cptB, C = cptC))

## Error: some elements of the conditional probability distributions of node A are not probabilities.

#### Continuous networks

Continuous nodes are modelled as linear regressions in Gaussian Bayesian networks; as such, the
relevant parameters of each local distribution are the regression coefficients (one for each of the
parents) and the standard deviation of the residuals. Both should be stored as the elements of a
list, respectively names `coef`

and `sd`

.

> distA = list(coef = c("(Intercept)" = 2), sd = 1) > distB = list(coef = c("(Intercept)" = 1), sd = 1.5) > distC = list(coef = c("(Intercept)" = 0.5, "A" = 0.75, "B" = 1.32), sd = 0.4) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC)) > cfit

Bayesian network parameters Parameters of node A (Gaussian distribution) Conditional density: A Coefficients: (Intercept) 2 Standard deviation of the residuals: 1 Parameters of node B (Gaussian distribution) Conditional density: B Coefficients: (Intercept) 1 Standard deviation of the residuals: 1.5 Parameters of node C (Gaussian distribution) Conditional density: C | A + B Coefficients: (Intercept) A B 0.50 0.75 1.32 Standard deviation of the residuals: 0.4

In addition, we can include fitted values and residuals for each node.

> distA = list(coef = c("(Intercept)" = 2), fitted = 1:10, resid = rnorm(10)) > distB = list(coef = c("(Intercept)" = 1), fitted = 3:12, resid = rnorm(10)) > distC = list(coef = c("(Intercept)" = 0.5, "A" = 0.75, "B" = 1.32), + fitted = 2:11, resid = rnorm(10)) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))

However, either all the nodes have both fitted values and residuals or none of them. If residuals
are provided, `sd`

is computed automatically by `custom.fit()`

.

Another option is to use `lm()`

to produce the parameter for each node, which is
convenient when we need to use advanced regression options such as weights or we have missing values
in the data.

> distA = lm(A ~ 1, data = gaussian.test) > distB = lm(B ~ 1, data = gaussian.test) > distC = lm(C ~ A + B, data = gaussian.test) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))

We can also use `penalized()`

to fit ridge and elastic net regressions as local
distributions instead of using ordinary least squares estimates. (Note that using LASSO does not
make sense in this context, all parents of a node are assumed to have non-zero coefficients or they
would not be parents at all.)

> library(penalized)

> distC = penalized(C ~ A + B, data = gaussian.test, + lambda1 = 0, lambda2 = 0.5, trace = FALSE) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))

Again note that `custom.fit()`

performs several checks on the local distributions:

- We must have exactly one regression coeffient for each parent of a node, with labels to match.
> distA = list(coef = c("(Intercept)" = 2), sd = 1) > distB = list(coef = c("(Intercept)" = 1, "A" = 2), sd = 1.5) > distC = list(coef = c("(Intercept)" = 0.5, "A" = 0.75, "B" = 1.32), sd = 0.4) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))

## Error: wrong number of coefficients for node B.

> distA = list(coef = c("(Intercept)" = 2), sd = 1) > distB = list(coef = c("(Intercept)" = 1), sd = 1.5) > distC = list(coef = c("(Intercept)" = 0.5, "A" = 0.75, "X" = 1.32), sd = 0.4) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))

## Error: wrong regression coefficients for node C (B).

- If both the residuals and their standard deviation are specified, they must match up to (a
very lax) numerical precision.
> distA = list(coef = c("(Intercept)" = 2), fitted = 1:10, + resid = rnorm(10), sd = 20) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))

## Error: the reported standard deviation of the residuals of node A does not match the observed one.

- Missing values are not allowed anywhere, even in the fitted values or in the residuals;
regression coefficients are checked to be finite and standard errors are also checked to be
non-negative.
> distA = list(coef = c("(Intercept)" = 2), fitted = c(NA, 2:10), + resid = rnorm(10)) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))

## Error: fitted must be a vector of numeric values, the fitted values for node A given its parents.

> distA = list(coef = c("(Intercept)" = 2), sd = -1) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))

## Error: sd must be non-negative, the standard deviation(s) of the residuals of node A.

#### Hybrid networks (mixed discrete and continuous)

In the case of hybrid data and conditional Gaussian Bayesian networks, the nature of each node and of its parents determine the form of the parameters of its local distribution:

- if the node is discrete, it can only have discrete parents and therefore the parameters of its local distribution come in the form of a conditional probability table;
- if the node is continuous and only has continuous parents, the parameters of its local distribution are the regression coefficients and the standard deviation of the residuals;
- if the node is continuous and only has at least one discrete parent, the regression coefficients for the continuous parents are stored in a matrix. Each row corresponds to a regression coefficient, and each column to one configuration of the levels of the discrete parents. The standard deviations of the residuals for the configurations of the levels of the discrete parents are stored in a vector.

In other words:

- discrete nodes are specified as in discrete Bayesian networks;
- continuous nodes with continuous parents are specifies as in Gaussian Bayesian networks;
- continuous nodes with discrete and (possibly) continuous parents get one set of regression coefficients and one standard deviation for each configuration of the discrete parents.

As a result specifying the parameters of each local distribution works as before in the first two cases.

> cptA = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH"))) > distB = list(coef = c("(Intercept)" = 1), sd = 1.5)

In the third case, the parameters are structured as follows.

> distC = list(coef = matrix(c(1.2, 2.3, 3.4, 4.5), ncol = 2, + dimnames = list(c("(Intercept)", "B"), NULL)), + sd = c(0.3, 0.6)) > distC

$coef [,1] [,2] (Intercept) 1.2 3.4 B 2.3 4.5 $sd [1] 0.3 0.6

> cgfit = custom.fit(net, dist = list(A = cptA, B = distB, C = distC)) > cgfit

Bayesian network parameters Parameters of node A (multinomial distribution) Conditional probability table: LOW HIGH 0.4 0.6 Parameters of node B (Gaussian distribution) Conditional density: B Coefficients: (Intercept) 1 Standard deviation of the residuals: 1.5 Parameters of node C (conditional Gaussian distribution) Conditional density: C | A + B Coefficients: 0 1 (Intercept) 1.2 3.4 B 2.3 4.5 Standard deviation of the residuals: 0 1 0.3 0.6 Discrete parents' configurations: A 0 LOW 1 HIGH

Fitted values and residuals can be specified for both Gaussian and conditional Gaussian nodes. In the former case, all parents of the node are continuous and therefore the parameters have the same form as for nodes in a Gaussian Bayesian network. In the latter, we must also specify which configuration of the discrete parents is associated with each fitted value and residual.

> fake = scale(rnorm(10)) > net = model2network("[A][B][C|A:B]") > cptA = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH"))) > distB = list(coef = c("(Intercept)" = 1), resid = fake, fitted = fake) > distC = list(coef = matrix(c(1.2, 2.3, 3.4, 4.5), ncol = 2, + dimnames = list(c("(Intercept)", "B"), NULL)), + resid = fake, fitted = fake, + configs = factor(c(rep(0, 5), rep(1, 5)))) > cgfit2 = custom.fit(net, dist = list(A = cptA, B = distB, C = distC))

### Creating custom fitted Bayesian networks using both data and expert knowledge

Specifying all the local distributions can be problematic in a large network. In most cases it
can be more convenient to create a `bn.fit`

object from (possibly dummy) data using
`bn.fit()`

, and then to modify just the local distributions of the nodes of interest.

- For discrete Bayesian networks (or discrete nodes in conditional Gaussian networks) we can
extract the conditional probability table stored in the
`bn.fit`

object with`coef()`

, modify it, and re-save it.> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]") > fitted = bn.fit(dag, learning.test) > fitted$C

Parameters of node C (multinomial distribution) Conditional probability table: a b c 0.7434 0.2048 0.0518

> cpt = coef(fitted$C) > cpt[1:3] = c(0.50, 0.25, 0.25) > fitted$C = cpt > fitted$C

Parameters of node C (multinomial distribution) Conditional probability table: a b c 0.50 0.25 0.25

- For Gaussian networks (and Gaussian nodes is conditional Gaussian networks) we can do the
same or just assign a new set of regression coefficients.
We can also use
> dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]") > fitted = bn.fit(dag, gaussian.test) > fitted$F

Parameters of node F (Gaussian distribution) Conditional density: F | A + D + E + G Coefficients: (Intercept) A D E G -0.006047321 1.994853041 1.005636909 1.002577002 1.494373265 Standard deviation of the residuals: 0.9961475

> fitted$F = list(coef = c(0, 2, 1, 0.5, 0.5), sd = 0.42) > fitted$F

Parameters of node F (Gaussian distribution) Conditional density: F | A + D + E + G Coefficients: (Intercept) A D E G 0.0 2.0 1.0 0.5 0.5 Standard deviation of the residuals: 0.42

`lm()`

or`penalized()`

to re-estimate the regression coefficients from data in a different way.> fitted$F = penalized(F ~ A + D + E + G, data = gaussian.test, lambda2 = 5, + trace = FALSE) > fitted$F

Parameters of node F (Gaussian distribution) Conditional density: F | A + D + E + G Coefficients: (Intercept) A D E G -0.0007259323 1.9928727472 1.0055773818 1.0023303829 1.4939902546 Standard deviation of the residuals: 0.9961499

- For conditional Gaussian nodes,
> dag = model2network("[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]") > fitted = bn.fit(dag, clgaussian.test) > fitted$G

Parameters of node G (conditional Gaussian distribution) Conditional density: G | A + D + E + F Coefficients: 0 1 2 3 (Intercept) 4.949135 2.384710 3.485634 2.140170 D 2.252665 4.066430 2.989924 5.806728 E 1.002022 1.000199 1.003836 1.001002 Standard deviation of the residuals: 0 1 2 3 0.05456809 0.14946301 0.26272612 0.35182957 Discrete parents' configurations: A F 0 a a 1 b a 2 a b 3 b b

> beta = coef(fitted$G) > beta[, "3"] = c(2, 1, 4) > fitted$G = list(coef = beta, sd = c(0.5, 0.5, 0.5, 0.5)) > fitted$G

Parameters of node G (conditional Gaussian distribution) Conditional density: G | A + D + E + F Coefficients: 0 1 2 3 (Intercept) 4.949135 2.384710 3.485634 2.000000 D 2.252665 4.066430 2.989924 1.000000 E 1.002022 1.000199 1.003836 4.000000 Standard deviation of the residuals: 0 1 2 3 0.5 0.5 0.5 0.5 Discrete parents' configurations: A F 0 a a 1 b a 2 a b 3 b b

`Sun Nov 5 20:44:03 2017`

with **bnlearn**

`4.3-20171001`

and `R version 3.0.2 (2013-09-25)`

.