## A network analysis of the symptoms from the Zung depression scale components, Briganti, Scutari and Linkowski, Psychological Reports (2020)

This is a short HOWTO describing the analysis in “Network Structures of Symptoms from the Zung Depression Scale” by Briganti, Scutari and Linkowski (2020, Psychological Reports).

### Loading the required libraries

> library(qgraph) > library(bootnet) > require(pcalg) > library(bnlearn) > library(NetworkComparisonTest)

### Loading and exploring the data

The data are available and can be downloaded from here. They comprise 1090 observations and 20 variables taking values between 1 and 4.

> data = readRDS("zung.rds") > names(data) = + c("Blue", "Morning", "Crying", "SleepP", "Eat", "Sex", "Weight","Constipated", + "HeartR","Tired", "Mind", "Things", "Restless", "Hope", "Irritable", + "Decision", "Useful", "LifeFull", "Dead", "Enjoy")

Exploring the data, we find that most of the variability is explained by the first 3 to 5 principal components. The
function `cor_auto()`

from the **qgraph** package automatically recognizes that the variables
are on Likert scales and treats them as ordered factors; manually converting them with `as.ordered()`

gives
the same results.

> plot(eigen(cor_auto(data))$values, type = "b", pch = 19)

A correlation graph (from the `qgraph()`

function in **qgraph**) shows that all variables are
positively correlated, with the notable exception of `Useful`

and `Weight`

. A fair number of
correlations appear to be strong in magnitude, but mist are relatively weak.

> qgraph(cor_auto(data))

### Learning an undirected network model

We can build more informative undirected network models than the correlation graphs above using partial
correlations instead of marginal correlation coefficients. One way to learn such a model is to use the *graphical
lasso*, in this case via `qgraph()`

. This method is described in detail in “The Elements of
Statistical Learning” by Hastie, Tibshirani and Friedman and is implemented in the **glasso**
(on which **qgraph** depends.)

> glasso = qgraph(cor(data), layout = "spring", graph = "glasso", + sampleSize = nrow(data), labels = names(data), theme = "colorblind") > qgraph(glasso)

We can also use the first part of the PC algorithm to learn the skeleton of a Bayesian network. This approach
associates arcs with partial correlations like the graphical LASSO, and produces an undirected network which
may be refined to produce a Bayesian network. With the `skeleton()`

function from the **pcalg**
package:

> skel = skeleton(suffStat = list(C = cor(data), n = nrow(data)), indepTest = gaussCItest, + alpha = 0.10, p = 20) > qgraph(skel, layout = "spring", labels = names(data))

The same model is learned by `pc.stable()`

in **bnlearn**, which provides an alternative
implementation of the PC algorithm.

> skel = pc.stable(data, alpha = 0.10, undirected = TRUE)

We can compute some descriptive statistics that summarize the network's structure using `centrality()`

from the **qgraph** package.

> glasso.stats = centrality(glasso) > glasso.stats[c("InDegree", "Betweenness", "Closeness")]

$InDegree Blue Morning Crying SleepP Eat Sex 1.1016105 0.1769498 0.9778115 0.6259414 0.9193397 0.4998780 Weight Constipated HeartR Tired Mind Things 0.3539278 0.3103145 0.7591750 0.9311434 1.1277081 0.6825825 Restless Hope Irritable Decision Useful LifeFull 0.5089186 0.7477680 0.8717946 0.8012730 0.7970176 0.8354880 Dead Enjoy 0.6784831 0.7287845 $Betweenness Blue Morning Crying SleepP Eat Sex 56 0 6 4 52 16 Weight Constipated HeartR Tired Mind Things 4 0 8 34 70 0 Restless Hope Irritable Decision Useful LifeFull 0 0 18 28 40 32 Dead Enjoy 18 14 $Closeness Blue Morning Crying SleepP Eat Sex 0.003229184 0.001448765 0.002796285 0.002367578 0.002958879 0.002549268 Weight Constipated HeartR Tired Mind Things 0.002368336 0.001896349 0.002666788 0.002782175 0.003476562 0.002801425 Restless Hope Irritable Decision Useful LifeFull 0.002476404 0.002505142 0.002823393 0.002949304 0.002702248 0.002624108 Dead Enjoy 0.002809306 0.002867569

> cor(glasso.stats$InDegree, glasso.stats$Betweenness, method = "spearman")

[1] 0.7589347

> cor(glasso.stats$InDegree, glasso.stats$Closeness, method = "spearman")

[1] 0.7789474

> cor(glasso.stats$Closeness, glasso.stats$Betweenness, method = "spearman")

[1] 0.7384639

Another function we considered to estimate an undirected network model is `estimateNetwork()`

from the
**bootnet** package, which provides another implementation of the graphical LASSO.

> glasso2 = estimateNetwork(data, default = "EBICglasso", corMethod = "cor", + corArgs = list(use = "pairwise.complete.obs"))

We can pass the object returned by `estimateNetwork()`

to the `bootnet()`

function to
compute various bootstrap estimates of arc strength and other graphical summaries of the network.

> boot = bootnet(glasso2, ncores = 4, nboots = 200, type = "nonparametric", + verbose = FALSE) > plot(boot, labels = FALSE, order = "sample")

### Learning a directed network model

We can learn a directed network model (that is, a Bayesian network) using the PC algorithm in a similar way,
calling either the `pc()`

function from **pcalg**:

> pc.fit = pc(suffStat = list(C = cor(data), n = nrow(data)), indepTest = gaussCItest, + alpha = 0.05, p = 20)

or `pc.stable()`

, if we are using **bnlearn**:

> pc.fit = pc.stable(data, alpha = 0.05)

In order to improve the stability of the learned network, we perform bootstrap aggregation and model averaging
with `boot.strength()`

and `averaged.network()`

instead of calling `pc.stable()`

directly.

> bootstr = boot.strength(data, R = 100, algorithm = "pc.stable") > bootstr[with(bootstr, strength >= 0.85 & direction >= 0.5), ]

from to strength direction 8 Blue HeartR 0.98 0.5408163 10 Blue Mind 0.93 0.5322581 14 Blue Irritable 0.97 0.8608247 39 Crying Blue 1.00 0.5100000 47 Crying Tired 1.00 0.6350000 71 SleepP Irritable 0.93 0.6774194 95 Eat Enjoy 0.98 0.6530612 100 Sex Eat 1.00 0.6750000 112 Sex LifeFull 0.97 0.5979381 114 Sex Enjoy 0.93 0.8387097 119 Weight Eat 1.00 0.5900000 156 HeartR SleepP 1.00 0.6550000 161 HeartR Tired 1.00 0.6350000 185 Tired Irritable 0.87 0.6091954 195 Mind Eat 0.99 0.5606061 201 Mind Things 1.00 0.5350000 204 Mind Irritable 0.86 0.7500000 205 Mind Decision 1.00 0.5350000 209 Mind Enjoy 0.92 0.6739130 224 Things Decision 0.97 0.5000000 232 Restless SleepP 0.91 0.6703297 238 Restless Tired 1.00 0.6150000 242 Restless Irritable 1.00 0.7200000 297 Decision Things 0.97 0.5000000 301 Decision Useful 1.00 0.5100000 318 Useful Hope 0.92 0.5760870 337 LifeFull Hope 0.99 0.6666667 340 LifeFull Useful 1.00 0.5500000 341 LifeFull Dead 0.85 0.6294118 356 Dead Hope 0.90 0.5500000 361 Dead Enjoy 0.96 0.5156250

> avgnet = averaged.network(bootstr, threshold = 0.85) > avgnet

Random/Generated Bayesian network model: [partially directed graph] nodes: 20 arcs: 30 undirected arcs: 1 directed arcs: 29 average markov blanket size: 4.80 average neighbourhood size: 3.00 average branching factor: 1.45 generation algorithm: Model Averaging significance threshold: 0.85

We can plot the result either with `strength.plot()`

or with `qgraph()`

, for consistency with
previous plots.

> sp = strength.plot(avgnet, bootstr, shape = "ellipse")

> qgraph(sp, layout = "spring", labels = nodes(avgnet))

Finally, we can test the causal effects of variables on each other using the implementation of the IDA algorithm in
the **pcalg** package. For instance, we can check the effect of `Morning`

on
`Sex`

as follows. (`ida()`

identifies variables by position, not by name.)

> ida(6, 2, cov(data), as.graphNEL(avgnet), method = "global")

### A network comparison test

Another interesting question we can investigate from the data is whether there is a difference in the symptoms
between men and women. We have split data sets for men (here) and for women
(here), and we will use the `NCT()`

function in the
**NetworkComparisonTest** package to answer it.

> dataF = readRDS("zungF.rds") > dataM = readRDS("zungM.rds") > > test = NCT(dataM, dataF, it = 500, binary.data = FALSE, test.edges = TRUE, + edges = "all", progressbar = FALSE)

`NCT()`

uses permutation testing in combination with several graph summary statistics (network structure
invariance, global strength invariance, edge invariance) to compare the undirected networks learned from
`dataM`

and `dataM`

using graphical LASSO. (Like that in the `glasso`

object above).

The p-value for maximum difference in edge weights (`nwinv.pval`

) and the p-value for the difference in
global strength are both higher than any threshold you would typically use, leading us to accept the null hypothesis
that there is no significant difference between men and women.

> test$nwinv.pval

[1] 0.378

> test$glstrinv.pval

[1] 0.186

`Wed Sep 16 16:33:38 2020`

with **bnlearn**

`4.6`

and `R version 4.0.2 (2020-06-22)`

.