Title: | Tools for Causal Discovery on Observational Data |
---|---|
Description: | Various tools for inferring causal models from observational data. The package includes an implementation of the temporal Peter-Clark (TPC) algorithm. Petersen, Osler and Ekstrøm (2021) <doi:10.1093/aje/kwab087>. It also includes general tools for evaluating differences in adjacency matrices, which can be used for evaluating performance of causal discovery procedures. |
Authors: | Anne Helby Petersen [aut, cre] |
Maintainer: | Anne Helby Petersen <[email protected]> |
License: | GPL-2 |
Version: | 0.9.4 |
Built: | 2025-01-30 04:23:54 UTC |
Source: | https://github.com/annennenne/causaldisco |
Two adjacency matrices are compared either in terms of adjacencies
(type = "adj"
) or orientations (type = "dir"
).
adj_confusion(est_amat, true_amat)
adj_confusion(est_amat, true_amat)
est_amat |
The estimated adjacency matrix, or |
true_amat |
The true adjacency matrix, or |
Adjacency comparison: The confusion matrix is a cross-tabulation
of adjacencies. Hence, a true positive means that the two inputs agree on
the presence of an adjacency. A true negative means that the two inputs agree
on no adjacency. A false positive means that est_amat
places an adjacency
where there should be none. A false negative means that est_amat
does
not place an adjacency where there should have been one.
Orientation comparison: The orientation confusion matrix is conditional on agreement on adjacency. This means that only adjacencies that are shared in both input matrices are considered, and agreement wrt. orientation is then computed only among these edges that occur in both matrices. A true positive is a correctly placed arrowhead (1), a false positive marks placement of arrowhead (1) where there should have been a tail (0), a false negative marks placement of tail (0) where there should have been an arrowhead (1), and a true negative marks correct placement of a tail (0).
A list with entries $tp
(number of true positives), $tn
(number of true negatives),
$fp
(number of false positives), and $tp
(number of false negatives).
############################################################################# # Compare two adjacency matrices ############################################ ############################################################################# x1 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) x2 <- matrix(c(0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0), 4, 4, byrow = TRUE) # confusion matrix for adjacencies confusion(x2, x1) # confusion matrix for conditional orientations confusion(x2, x1, type = "dir") ############################################################################# # Compare estimated cpdag with true adjacency matrix ######################## ############################################################################# # simulate DAG adjacency matrix and Gaussian data set.seed(123) x3 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) ex_data <- simGausFromDAG(x3, n = 50) pcres <- pc(ex_data, sparsity = 0.1, test = corTest) # compare adjacencies with true amat (x1) confusion(pcres, x3) # compare conditional orientations with true amat confusion(pcres, x1, type = "dir")
############################################################################# # Compare two adjacency matrices ############################################ ############################################################################# x1 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) x2 <- matrix(c(0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0), 4, 4, byrow = TRUE) # confusion matrix for adjacencies confusion(x2, x1) # confusion matrix for conditional orientations confusion(x2, x1, type = "dir") ############################################################################# # Compare estimated cpdag with true adjacency matrix ######################## ############################################################################# # simulate DAG adjacency matrix and Gaussian data set.seed(123) x3 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) ex_data <- simGausFromDAG(x3, n = 50) pcres <- pc(ex_data, sparsity = 0.1, test = corTest) # compare adjacencies with true amat (x1) confusion(pcres, x3) # compare conditional orientations with true amat confusion(pcres, x1, type = "dir")
If the input is a tpdag or cpdag, the resulting adjacency matrix A is "from-to" matrix encoded as follows: - A(i,j) = 1 and A(j,i) = 0 means there is an edge j -> i. - A(j,i) = 1 and A(i,j) = 0 means there is an edge i -> j. - A(i,j) = 1 and A(j,i) = 1 means there is an undirected edge between i and j, i - j. - A(i,j) = 0 and A(j,i) = 0 means there is no edge between i and j.
amat(x)
amat(x)
x |
|
If the inout is a tpag or pag, there are four possible entry values: 0 (no edge), 1 (circle), 2 (arrowhead), 3 (tail). It is still encoded as a "to-from" adjacency matrix, which means that e.g. A(i,j) = 1 places a circle in the directed from j to i. For example, if A(i,j) = 1 and A(j,i) = 2, we have that i o-> j. Similarly, A(i,j) = 2 and A(j,i) = 3 mean that i <- j.
Convert adjacency matrix to graphNEL object
as.graphNEL(amat)
as.graphNEL(amat)
amat |
An adjacency matrix |
A graphNEL
object, see graphNEL-class
.
Computes the average degree, i.e. the number of edges divided by the number of nodes.
average_degree(amat)
average_degree(amat)
amat |
An adjacency matrix |
A numeric.
Compare edges in two tpdag objects or two tskeleton objects. Note that they should be based on the same variables. Only edge absence/presence is compared, not edge orientation.
compare(x, y = NULL)
compare(x, y = NULL)
x |
First object |
y |
Second object (optional) |
A list with entries: $nedges1
(the number of
edges in the first object), $nedges2
(the number of edges
in the second object), $psi1
(the test significance level
of the first object), $psi2
(the test significance level of
the second object), $nadded
(the number of additional edges in
object 2, relative to object 1), and nremoved
(the number of
absent edges in object 2, relative to object 1).
Two adjacency matrices are compared either in terms of adjacencies
(type = "adj"
) or orientations (type = "dir"
).
confusion(est_amat, true_amat, type = "adj")
confusion(est_amat, true_amat, type = "adj")
est_amat |
The estimated adjacency matrix, or |
true_amat |
The true adjacency matrix, or |
type |
String indicating whether the confusion matrix should be computed for adjacencies
( |
Adjacency comparison: The confusion matrix is a cross-tabulation
of adjacencies. Hence, a true positive means that the two inputs agree on
the presence of an adjacency. A true negative means that the two inputs agree
on no adjacency. A false positive means that est_amat
places an adjacency
where there should be none. A false negative means that est_amat
does
not place an adjacency where there should have been one.
Orientation comparison: The orientation confusion matrix is conditional on agreement on adjacency. This means that only adjacencies that are shared in both input matrices are considered, and agreement wrt. orientation is then computed only among these edges that occur in both matrices. A true positive is a correctly placed arrowhead (1), a false positive marks placement of arrowhead (1) where there should have been a tail (0), a false negative marks placement of tail (0) where there should have been an arrowhead (1), and a true negative marks correct placement of a tail (0).
A list with entries $tp
(number of true positives), $tn
(number of true negatives),
$fp
(number of false positives), and $tp
(number of false negatives).
############################################################################# # Compare two adjacency matrices ############################################ ############################################################################# x1 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) x2 <- matrix(c(0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0), 4, 4, byrow = TRUE) # confusion matrix for adjacencies confusion(x2, x1) # confusion matrix for conditional orientations confusion(x2, x1, type = "dir") ############################################################################# # Compare estimated cpdag with true adjacency matrix ######################## ############################################################################# # simulate DAG adjacency matrix and Gaussian data set.seed(123) x3 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) ex_data <- simGausFromDAG(x3, n = 50) pcres <- pc(ex_data, sparsity = 0.1, test = corTest) # compare adjacencies with true amat (x1) confusion(pcres, x3) # compare conditional orientations with true amat confusion(pcres, x1, type = "dir")
############################################################################# # Compare two adjacency matrices ############################################ ############################################################################# x1 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) x2 <- matrix(c(0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0), 4, 4, byrow = TRUE) # confusion matrix for adjacencies confusion(x2, x1) # confusion matrix for conditional orientations confusion(x2, x1, type = "dir") ############################################################################# # Compare estimated cpdag with true adjacency matrix ######################## ############################################################################# # simulate DAG adjacency matrix and Gaussian data set.seed(123) x3 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) ex_data <- simGausFromDAG(x3, n = 50) pcres <- pc(ex_data, sparsity = 0.1, test = corTest) # compare adjacencies with true amat (x1) confusion(pcres, x3) # compare conditional orientations with true amat confusion(pcres, x1, type = "dir")
This function simply calls the gaussCItest
function from the pcalg
package.
corTest(x, y, S, suffStat)
corTest(x, y, S, suffStat)
x |
Index of x variable |
y |
Index of y variable |
S |
Index of S variable(s), possibly NULL |
suffStat |
Sufficient statistic; list with data, binary variables and order. |
A numeric, which is the p-value of the test.
Two adjacency matrices are compared either in terms of adjacencies
(type = "adj"
) or orientations (type = "dir"
).
dir_confusion(est_amat, true_amat)
dir_confusion(est_amat, true_amat)
est_amat |
The estimated adjacency matrix, or |
true_amat |
The true adjacency matrix, or |
Adjacency comparison: The confusion matrix is a cross-tabulation
of adjacencies. Hence, a true positive means that the two inputs agree on
the presence of an adjacency. A true negative means that the two inputs agree
on no adjacency. A false positive means that est_amat
places an adjacency
where there should be none. A false negative means that est_amat
does
not place an adjacency where there should have been one.
Orientation comparison: The orientation confusion matrix is conditional on agreement on adjacency. This means that only adjacencies that are shared in both input matrices are considered, and agreement wrt. orientation is then computed only among these edges that occur in both matrices. A true positive is a correctly placed arrowhead (1), a false positive marks placement of arrowhead (1) where there should have been a tail (0), a false negative marks placement of tail (0) where there should have been an arrowhead (1), and a true negative marks correct placement of a tail (0).
A list with entries $tp
(number of true positives), $tn
(number of true negatives),
$fp
(number of false positives), and $tp
(number of false negatives).
############################################################################# # Compare two adjacency matrices ############################################ ############################################################################# x1 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) x2 <- matrix(c(0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0), 4, 4, byrow = TRUE) # confusion matrix for adjacencies confusion(x2, x1) # confusion matrix for conditional orientations confusion(x2, x1, type = "dir") ############################################################################# # Compare estimated cpdag with true adjacency matrix ######################## ############################################################################# # simulate DAG adjacency matrix and Gaussian data set.seed(123) x3 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) ex_data <- simGausFromDAG(x3, n = 50) pcres <- pc(ex_data, sparsity = 0.1, test = corTest) # compare adjacencies with true amat (x1) confusion(pcres, x3) # compare conditional orientations with true amat confusion(pcres, x1, type = "dir")
############################################################################# # Compare two adjacency matrices ############################################ ############################################################################# x1 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) x2 <- matrix(c(0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0), 4, 4, byrow = TRUE) # confusion matrix for adjacencies confusion(x2, x1) # confusion matrix for conditional orientations confusion(x2, x1, type = "dir") ############################################################################# # Compare estimated cpdag with true adjacency matrix ######################## ############################################################################# # simulate DAG adjacency matrix and Gaussian data set.seed(123) x3 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) ex_data <- simGausFromDAG(x3, n = 50) pcres <- pc(ex_data, sparsity = 0.1, test = corTest) # compare adjacencies with true amat (x1) confusion(pcres, x3) # compare conditional orientations with true amat confusion(pcres, x1, type = "dir")
Two adjacency matrices are compared either in terms of adjacencies
(type = "adj"
) or orientations (type = "dir"
).
dir_confusion_original(est_amat, true_amat)
dir_confusion_original(est_amat, true_amat)
est_amat |
The estimated adjacency matrix, or |
true_amat |
The true adjacency matrix, or |
This is an old version of the function, included for possible backwards compatibility. Edges are scored as follows: A correctly unoriented edge counts as a true negative (TN). An undirected edge that should have been directed counts as a false negative (FN). A directed edge that should have been undirected counts as a false positive (FP). A directed edge oriented in the correct direction counts as a true positive (TP). A directed edge oriented in the incorrect direction counts as both a false positive (FP) and a false negative (FN).
A list with entries $tp
(number of true positives), $tn
(number of true negatives),
$fp
(number of false positives), and $tp
(number of false negatives).
############################################################################# # Compare two adjacency matrices ############################################ ############################################################################# x1 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) x2 <- matrix(c(0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0), 4, 4, byrow = TRUE) # confusion matrix for adjacencies confusion(x2, x1) # confusion matrix for conditional orientations confusion(x2, x1, type = "dir") ############################################################################# # Compare estimated cpdag with true adjacency matrix ######################## ############################################################################# # simulate DAG adjacency matrix and Gaussian data set.seed(123) x3 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) ex_data <- simGausFromDAG(x3, n = 50) pcres <- pc(ex_data, sparsity = 0.1, test = corTest) # compare adjacencies with true amat (x1) confusion(pcres, x3) # compare conditional orientations with true amat confusion(pcres, x1, type = "dir")
############################################################################# # Compare two adjacency matrices ############################################ ############################################################################# x1 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) x2 <- matrix(c(0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0), 4, 4, byrow = TRUE) # confusion matrix for adjacencies confusion(x2, x1) # confusion matrix for conditional orientations confusion(x2, x1, type = "dir") ############################################################################# # Compare estimated cpdag with true adjacency matrix ######################## ############################################################################# # simulate DAG adjacency matrix and Gaussian data set.seed(123) x3 <- matrix(c(0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0), 4, 4, byrow = TRUE) ex_data <- simGausFromDAG(x3, n = 50) pcres <- pc(ex_data, sparsity = 0.1, test = corTest) # compare adjacencies with true amat (x1) confusion(pcres, x3) # compare conditional orientations with true amat confusion(pcres, x1, type = "dir")
Produces a list of edges from an adjacency matrix.
edges(amat)
edges(amat)
amat |
An adjacency matrix. |
A list consisting of two lists: One for oriented edges ($dir
),
and one for unoriented edges ($undir
).
Extracts the adjacency matrix from an EssGraph-class
object. This object is returned
by score-based causal discovery algorithms in the pcalg package.
essgraph2amat(essgraph, p = length(essgraph$field(".nodes")))
essgraph2amat(essgraph, p = length(essgraph$field(".nodes")))
essgraph |
An |
p |
The number of nodes in the graph |
An adjacency matrix (square matrix with 0/1 entries).
Applies several different metrics to evaluate difference between estimated and true adjacency matrices. Intended to be used to evaluate performance of causal discovery algorithms.
evaluate(est, true, metrics, ...)
evaluate(est, true, metrics, ...)
est |
Estimated adjacency matrix/matrices. |
true |
True adjacency matrix/matrices. |
metrics |
List of metrics, see details. |
... |
Further arguments that depend on input type. Currently only |
Two options for input are available: Either est
and true
can be two adjacency matrices, or they can be two arrays of adjacency matrices.
The arrays should have shape where n is the number of of matrices,
and p is the number of nodes/variables.
The metrics should be given as a list with slots $adj
, $dir
and
$other
. Metrics under $adj
are applied to the adjacency confusion
matrix, while metrics under $dir
are applied to the conditional orientation
confusion matrix (see confusion). Metrics under $other
are applied
without computing confusion matrices first.
Available metrics to be used with confusion matrices are precision, recall, specificity, FOR, FDR, NPV, F1 and G1. The user can supply custom metrics as well: They need to have the confusion matrix as their first argument and should return a numeric.
Available metrics to be used as "other" is: shd. The user
can supply custom metrics as well: They need to have arguments est_amat
and true_amat
,
where the former is the estimated adjacency matrix and the latter is the true adjacency matrix. The
metrics should return a numeric.
A data.frame with one column for each computed metric and one row per evaluated
matrix pair. Adjacency metrics are prefixed with "adj_", orientation metrics are prefixed
with "dir_", other metrics do not get a prefix. If the first argument is a matrix, list.out = TRUE
can be used to change the return object to a list instead. This list will contain three lists, where
adjacency, orientation and other metrics are reported, respectively.
Applies several different metrics to evaluate difference between estimated and true adjacency matrices. Intended to be used to evaluate performance of causal discovery algorithms.
## S3 method for class 'array' evaluate(est, true, metrics, ...)
## S3 method for class 'array' evaluate(est, true, metrics, ...)
est |
Estimated adjacency matrix/matrices. |
true |
True adjacency matrix/matrices. |
metrics |
List of metrics, see details. |
... |
Further arguments that depend on input type. Currently only |
Two options for input are available: Either est
and true
can be two adjacency matrices, or they can be two arrays of adjacency matrices.
The arrays should have shape where n is the number of of matrices,
and p is the number of nodes/variables.
The metrics should be given as a list with slots $adj
, $dir
and
$other
. Metrics under $adj
are applied to the adjacency confusion
matrix, while metrics under $dir
are applied to the conditional orientation
confusion matrix (see confusion). Metrics under $other
are applied
without computing confusion matrices first.
Available metrics to be used with confusion matrices are precision, recall, specificity, FOR, FDR, NPV, F1 and G1. The user can supply custom metrics as well: They need to have the confusion matrix as their first argument and should return a numeric.
Available metrics to be used as "other" is: shd. The user
can supply custom metrics as well: They need to have arguments est_amat
and true_amat
,
where the former is the estimated adjacency matrix and the latter is the true adjacency matrix. The
metrics should return a numeric.
A data.frame with one column for each computed metric and one row per evaluated
matrix pair. Adjacency metrics are prefixed with "adj_", orientation metrics are prefixed
with "dir_", other metrics do not get a prefix. If the first argument is a matrix, list.out = TRUE
can be used to change the return object to a list instead. This list will contain three lists, where
adjacency, orientation and other metrics are reported, respectively.
Applies several different metrics to evaluate difference between estimated and true adjacency matrices. Intended to be used to evaluate performance of causal discovery algorithms.
## S3 method for class 'matrix' evaluate(est, true, metrics, list.out = FALSE, ...)
## S3 method for class 'matrix' evaluate(est, true, metrics, list.out = FALSE, ...)
est |
Estimated adjacency matrix/matrices. |
true |
True adjacency matrix/matrices. |
metrics |
List of metrics, see details. |
list.out |
If |
... |
Further arguments that depend on input type. Currently only |
Two options for input are available: Either est
and true
can be two adjacency matrices, or they can be two arrays of adjacency matrices.
The arrays should have shape where n is the number of of matrices,
and p is the number of nodes/variables.
The metrics should be given as a list with slots $adj
, $dir
and
$other
. Metrics under $adj
are applied to the adjacency confusion
matrix, while metrics under $dir
are applied to the conditional orientation
confusion matrix (see confusion). Metrics under $other
are applied
without computing confusion matrices first.
Available metrics to be used with confusion matrices are precision, recall, specificity, FOR, FDR, NPV, F1 and G1. The user can supply custom metrics as well: They need to have the confusion matrix as their first argument and should return a numeric.
Available metrics to be used as "other" is: shd. The user
can supply custom metrics as well: They need to have arguments est_amat
and true_amat
,
where the former is the estimated adjacency matrix and the latter is the true adjacency matrix. The
metrics should return a numeric.
A data.frame with one column for each computed metric and one row per evaluated
matrix pair. Adjacency metrics are prefixed with "adj_", orientation metrics are prefixed
with "dir_", other metrics do not get a prefix. If the first argument is a matrix, list.out = TRUE
can be used to change the return object to a list instead. This list will contain three lists, where
adjacency, orientation and other metrics are reported, respectively.
Applies several different metrics to evaluate difference between estimated and true adjacency matrices. Intended to be used to evaluate performance of causal discovery algorithms.
## S3 method for class 'tamat' evaluate(est, true, metrics, ...)
## S3 method for class 'tamat' evaluate(est, true, metrics, ...)
est |
Estimated adjacency matrix/matrices. |
true |
True adjacency matrix/matrices. |
metrics |
List of metrics, see details. |
... |
Further arguments that depend on input type. Currently only |
Two options for input are available: Either est
and true
can be two adjacency matrices, or they can be two arrays of adjacency matrices.
The arrays should have shape where n is the number of of matrices,
and p is the number of nodes/variables.
The metrics should be given as a list with slots $adj
, $dir
and
$other
. Metrics under $adj
are applied to the adjacency confusion
matrix, while metrics under $dir
are applied to the conditional orientation
confusion matrix (see confusion). Metrics under $other
are applied
without computing confusion matrices first.
Available metrics to be used with confusion matrices are precision, recall, specificity, FOR, FDR, NPV, F1 and G1. The user can supply custom metrics as well: They need to have the confusion matrix as their first argument and should return a numeric.
Available metrics to be used as "other" is: shd. The user
can supply custom metrics as well: They need to have arguments est_amat
and true_amat
,
where the former is the estimated adjacency matrix and the latter is the true adjacency matrix. The
metrics should return a numeric.
A data.frame with one column for each computed metric and one row per evaluated
matrix pair. Adjacency metrics are prefixed with "adj_", orientation metrics are prefixed
with "dir_", other metrics do not get a prefix. If the first argument is a matrix, list.out = TRUE
can be used to change the return object to a list instead. This list will contain three lists, where
adjacency, orientation and other metrics are reported, respectively.
Computes F1 score from a confusion matrix, see confusion.
The F1 score is defined as , where TP are true positives,
FP are false positives, and FN are false negatives. If TP + FP + FN = 0, 1 is returned.
F1(confusion)
F1(confusion)
confusion |
Confusion matrix as obtained from confusion |
A numeric in [0,1].
This is a wrapper function for the fci
function as
implemented in the pcalg package. All computations are carried out by the
pcalg package. The default output object however matches that of tfci, see
this function for details about how the adjacency matrix is encoded.
fci( data = NULL, sparsity = 10^(-1), test = regTest, suffStat = NULL, method = "stable.fast", methodNA = "none", methodOri = "conservative", output = "pag", varnames = NULL, ... )
fci( data = NULL, sparsity = 10^(-1), test = regTest, suffStat = NULL, method = "stable.fast", methodNA = "none", methodOri = "conservative", output = "pag", varnames = NULL, ... )
data |
A data.frame with data. All variables should be assigned to exactly one period by prefixing them with the period name (see example below). |
sparsity |
The sparsity level to be used for independence testing (i.e. significance level threshold to use for each test). |
test |
A procedure for testing conditional independence.
The default, |
suffStat |
Sufficient statistic. If this argument is supplied, the sufficient statistic is not computed from the inputted data. The format and contents of the sufficient statistic depends on which test is being used. |
method |
Which method to use for skeleton construction, must be
|
methodNA |
Method for handling missing information ( |
methodOri |
Method for handling conflicting separating sets when orienting
edges, must be one of |
output |
One of |
varnames |
A character vector of variable names. It only needs to be supplied
if the |
... |
Further optional arguments which are passed to
|
# simulate linear Gaussian data w unobserved variable L1 n <- 100 L1 <- rnorm(n) X1 <- rnorm(n) X2 <- L1 + X1 + rnorm(n) X3 <- X1 + rnorm(n) X4 <- X3 + L1 + rnorm(n) d <- data.frame(p1_X1 = X1, p1_X2 = X2, p2_X3 = X3, p2_X4 = X4) # use FCI algorithm to recover PAG fci(d, test = corTest)
# simulate linear Gaussian data w unobserved variable L1 n <- 100 L1 <- rnorm(n) X1 <- rnorm(n) X2 <- L1 + X1 + rnorm(n) X3 <- X1 + rnorm(n) X4 <- X3 + L1 + rnorm(n) d <- data.frame(p1_X1 = X1, p1_X2 = X2, p2_X3 = X3, p2_X4 = X4) # use FCI algorithm to recover PAG fci(d, test = corTest)
Computes false discovery rate from a confusion matrix, see confusion. False discovery rate is defined as FP/(FP + TP), where FP are false positives and TP are true positives. If FP + TP = 0, 0 is returned.
FDR(confusion)
FDR(confusion)
confusion |
Confusion matrix as obtained from confusion |
A numeric in [0,1].
Computes false omission rate from a confusion matrix, see confusion. False omission rate is defined as FN/(FN + TN), where FN are false negatives and TN are true negatives. If FN + TN = 0, 0 is returned.
FOR(confusion)
FOR(confusion)
confusion |
Confusion matrix as obtained from confusion |
A numeric in [0,1].
Computes G1 score from a confusion matrix, see confusion. G1 score is F1 score with
reversed roles of 0/1 classifications, see Petersen et al. 2022.
The G1 score is defined as , where TN are true negatives,
FP are false positives, and FN are false negatives. If TN + FN + FP = 0, 1 is returned.
G1(confusion)
G1(confusion)
confusion |
Confusion matrix as obtained from confusion |
A numeric in [0,1].
Petersen, Anne Helby, et al. "Causal discovery for observational sciences using supervised machine learning." arXiv preprint arXiv:2202.12813 (2022).
The score is intended to be used with score-based causal discovery algorithms
from the pcalg package. It is identical to the GaussL0penObsScore-class
,
except that it takes in a correlation matrix instead of the full data set.
GaussL0penObsScore-class
.
gausCorScore(cormat, n, p = NULL, lambda = NULL, ...)
gausCorScore(cormat, n, p = NULL, lambda = NULL, ...)
cormat |
A correlation matrix. Needs to be symmetric. |
n |
The number of observations in the dataset that the correlation matrix was computed from. |
p |
The number of variables. This is inferred from the cormat if not supplied. |
lambda |
Penalty to use for the score. If |
... |
Other arguments passed along to |
A Score
object (S4), see Score-class
.
# Simulate data and compute correlation matrix x1 <- rnorm(100) x2 <- rnorm(100) x3 <- x1 + x2 + rnorm(100) d <- data.frame(x1, x2, x3) cmat <- cor(d) # Use gausCorScore with pcalg::ges() pcalg::ges(gausCorScore(cmat, n = 100))
# Simulate data and compute correlation matrix x1 <- rnorm(100) x2 <- rnorm(100) x3 <- x1 + x2 + rnorm(100) d <- data.frame(x1, x2, x3) cmat <- cor(d) # Use gausCorScore with pcalg::ges() pcalg::ges(gausCorScore(cmat, n = 100))
Constructor Calculates the local score of a vertex and its parents
Convert graphNEL object to adjacency matrix
graph2amat(graph, toFrom = TRUE, type = "pdag")
graph2amat(graph, toFrom = TRUE, type = "pdag")
graph |
A graphNEL object. |
toFrom |
Logical indicating whether the resulting adjancency matrix is "to-from" (default), or "from-to", see details. |
type |
The type of adjancency matrix, must be one of |
A "to-from" pdag
adjacency matrix is encoded as follows: A(i,j) = 1 and A(j,i) = 0
means there is an edge i -> j. A(j,i) = 1 and A(i,j) = 0 means there is an edge j -> i.
A(i,j) = 1 and A(j,i) = 1 means there is an undirected edge between i and j, i - j.
A(i,j) = 0 and A(j,i) = 0 means there is no edge between i and j.
A "from-to" adjacency matrix is the transpose of a "to-from" adjacency matrix.
A "from-to" pdag
adjacency matrix is hence encoded as follows: A(i,j) = 1 and A(j,i) = 0
means there is an edge j -> i. A(j,i) = 1 and A(i,j) = 0 means there is an edge i -> j.
A(i,j) = 1 and A(j,i) = 1 means there is an undirected edge between i and j, i - j.
A(i,j) = 0 and A(j,i) = 0 means there is no edge between i and j.
See amat for details about how an ag
adjacency matrix is encoded.
Check for CPDAG
is_cpdag(amat)
is_cpdag(amat)
amat |
An adjacency matrix |
Check: Is adjacency matrix proper CPDAG? See isValidGraph
for
definition.
A logical.
Check for PDAG
is_pdag(amat)
is_pdag(amat)
amat |
An adjacency matrix |
Check: Is adjacency matrix proper PDAG? See isValidGraph
for
definition.
A logical.
Generate Latex tikz code for plotting a temporal DAG, PDAG or PAG.
maketikz( model, xjit = 2, yjit = 2, markperiods = TRUE, xpgap = 4, annotateEdges = NULL, addAxis = TRUE, varLabels = NULL, periodLabels = NULL, annotationLabels = NULL, clipboard = TRUE, rawout = FALSE, colorAnnotate = NULL, bendedges = FALSE )
maketikz( model, xjit = 2, yjit = 2, markperiods = TRUE, xpgap = 4, annotateEdges = NULL, addAxis = TRUE, varLabels = NULL, periodLabels = NULL, annotationLabels = NULL, clipboard = TRUE, rawout = FALSE, colorAnnotate = NULL, bendedges = FALSE )
model |
|
xjit |
How much should nodes within a period be jittered horizontally. |
yjit |
Vertical distance between nodes within a period. |
markperiods |
If |
xpgap |
Horizontal gap between different periods. |
annotateEdges |
If |
addAxis |
If |
varLabels |
Optional labels for nodes (variables). Should be given as a named list, where
the name is the variable name, and the entry is the label, e.g. |
periodLabels |
Optional labels for periods. Should be given as a named list, where
the name is the period name (as stored in the |
annotationLabels |
Optional labels for edge annotations. Only used if |
clipboard |
If |
rawout |
If |
colorAnnotate |
Named list of colors to use to mark edge annotations instead of labels. This
overrules |
bendedges |
If |
Note that it is necessary to read in relevant tikz libraries in the
Latex preamble. The relevant lines of code are (depending a bit on parameter settings): \usepackage{tikz}
\usetikzlibrary{arrows.meta,arrows,shapes,decorations,automata,backgrounds,petri}
\usepackage{pgfplots}
Silently returns a character vector with lines of tikz code. The function
furthermore has a side-effect. If clipboard = TRUE
, the side-effect is that the tikz
code is also copied to the clipboard. If clipboard = FALSE
, the tikz code is instead printed
in the console.
# Make tikz figure code from tpdag, print code to screen data(tpcExample) tpdag1 <- tpc(tpcExample, order = c("child", "youth", "oldage"), sparsity = 0.01, test = corTest) maketikz(tpdag1, clipboard = FALSE) # Make tikz figure code from tamat, copy code to clipboard thisdag <- simDAG(5) rownames(thisdag) <- colnames(thisdag) <- c("child_x", "child_y", "child_z", "adult_x", "adult_y") thistamat <- tamat(thisdag, order = c("child", "adult")) ## Not run: maketikz(thistamat) ## End(Not run)
# Make tikz figure code from tpdag, print code to screen data(tpcExample) tpdag1 <- tpc(tpcExample, order = c("child", "youth", "oldage"), sparsity = 0.01, test = corTest) maketikz(tpdag1, clipboard = FALSE) # Make tikz figure code from tamat, copy code to clipboard thisdag <- simDAG(5) rownames(thisdag) <- colnames(thisdag) <- c("child_x", "child_y", "child_z", "adult_x", "adult_y") thistamat <- tamat(thisdag, order = c("child", "adult")) ## Not run: maketikz(thistamat) ## End(Not run)
Computes the number of edges a graph with p
nodes will have if its
fully connected.
maxnedges(p)
maxnedges(p)
p |
The number of nodes in the graph |
A numeric.
Computes the number of different possible DAGs that can be constructed over a given number of nodes.
nDAGs(p)
nDAGs(p)
p |
The number of nodes. |
A numeric.
Counts the number of edges in an adjacency matrix.
nedges(amat)
nedges(amat)
amat |
An adjacency matrix |
A numeric (non-negative integer).
Computes negative predictive value recall from a confusion matrix, see confusion. Negative predictive value is defined as TN/(TN + FN), where TN are true negatives and FN are false negatives. If TP + FN = 0, 0 is returned.
NPV(confusion)
NPV(confusion)
confusion |
Confusion matrix as obtained from confusion |
A numeric in [0,1].
This is a wrapper function for the pc
function as
implemented in the pcalg package. All computations are carried out by the
pcalg package.
pc( data = NULL, sparsity = 10^(-1), test = regTest, suffStat = NULL, method = "stable.fast", methodNA = "none", methodOri = "conservative", output = "cpdag", varnames = NULL, conservative = TRUE, ... )
pc( data = NULL, sparsity = 10^(-1), test = regTest, suffStat = NULL, method = "stable.fast", methodNA = "none", methodOri = "conservative", output = "cpdag", varnames = NULL, conservative = TRUE, ... )
data |
A data.frame with data. All variables should be assigned to exactly one period by prefixing them with the period name (see example below). |
sparsity |
The sparsity level to be used for independence testing (i.e. significance level threshold to use for each test). |
test |
A procedure for testing conditional independence.
The default, |
suffStat |
Sufficient statistic. If this argument is supplied, the sufficient statistic is not computed from the inputted data. The format and contents of the sufficient statistic depends on which test is being used. |
method |
Which method to use for skeleton construction, must be
|
methodNA |
Method for handling missing information ( |
methodOri |
Method for handling conflicting separating sets when orienting
edges, must be one of |
output |
One of |
varnames |
A character vector of variable names. It only needs to be supplied
if the |
conservative |
Logital, if |
... |
Further optional arguments which are passed to
|
Note that all independence test procedures implemented
in the pcalg
package may be used, see pc
.
The methods for handling missing information require that the data
,
rather than the suffStat
argument is used for inputting data; the latter
assumes no missing information and hence always sets methodNA = "none"
.
If the test is corTest
, test-wise deletion is performed when computing the
sufficient statistic (correlation matrix) (so for each pair of variables, only
complete cases are used). If the test is regTest
, test-wise deletion
is performed for each conditional independence test instead.
A tpdag
or tskeleton
object. Both return types are
S3 objects, i.e., lists with entries: $amat
(the estimated adjacency
matrix), $order
(character vector with the order, as inputted to
this function), $psi
(the significance level used for testing), and
$ntests
(the number of tests conducted).
# PC on included example data, use sparsity psi = 0.01, default test (regression-based #information loss): data(tpcExample) pc(tpcExample, sparsity = 0.01) # PC on included example data, use sparsity psi = 0.01, use test for vanishing partial # correlations: data(tpcExample) pc(tpcExample, sparsity = 0.01, test = corTest)
# PC on included example data, use sparsity psi = 0.01, default test (regression-based #information loss): data(tpcExample) pc(tpcExample, sparsity = 0.01) # PC on included example data, use sparsity psi = 0.01, use test for vanishing partial # correlations: data(tpcExample) pc(tpcExample, sparsity = 0.01, test = corTest)
Plot partial ancestral graph (PAG)
## S3 method for class 'pag' plot(x, ...)
## S3 method for class 'pag' plot(x, ...)
x |
pag object
to be plotted (as outputted from |
... |
Currently not in use. |
No return value, the function is called for its side-effects (plotting).
This code is a modification of the fciAlgo plotting method implemented in the pcalg package.
# simulate linear Gaussian data w unobserved variable L1 n <- 100 L1 <- rnorm(n) X1 <- rnorm(n) X2 <- L1 + X1 + rnorm(n) X3 <- X1 + rnorm(n) X4 <- X3 + L1 + rnorm(n) d <- data.frame(p1_X1 = X1, p1_X2 = X2, p2_X3 = X3, p2_X4 = X4) # use FCI algorithm to recover PAG res <- fci(d, test = corTest) # plot plot(res)
# simulate linear Gaussian data w unobserved variable L1 n <- 100 L1 <- rnorm(n) X1 <- rnorm(n) X2 <- L1 + X1 + rnorm(n) X3 <- X1 + rnorm(n) X4 <- X3 + L1 + rnorm(n) d <- data.frame(p1_X1 = X1, p1_X2 = X2, p2_X3 = X3, p2_X4 = X4) # use FCI algorithm to recover PAG res <- fci(d, test = corTest) # plot plot(res)
Plot adjacency matrix with order information
## S3 method for class 'tamat' plot(x, ...)
## S3 method for class 'tamat' plot(x, ...)
x |
tamat (temporal adjacency matrix) object to be plotted
(as outputted from |
... |
Further plotting arguments passed along to |
No return value, the function is called for its side-effects (plotting).
Plot temporal partial ancestral graph (TPAG)
## S3 method for class 'tpag' plot(x, ...)
## S3 method for class 'tpag' plot(x, ...)
x |
tpag object
to be plotted (as outputted from |
... |
Currently not in use. |
No return value, the function is called for its side-effects (plotting).
This code is a modification of the fciAlgo plotting method implemented in the pcalg package.
# simulate linear Gaussian data w unobserved variable L1 n <- 100 L1 <- rnorm(n) X1 <- rnorm(n) X2 <- L1 + X1 + rnorm(n) X3 <- X1 + rnorm(n) X4 <- X3 + L1 + rnorm(n) d <- data.frame(p1_X1 = X1, p1_X2 = X2, p2_X3 = X3, p2_X4 = X4) # use FCI algorithm to recover PAG res <- tfci(d, order = c("p1", "p2"), test = corTest) # plot plot(res)
# simulate linear Gaussian data w unobserved variable L1 n <- 100 L1 <- rnorm(n) X1 <- rnorm(n) X2 <- L1 + X1 + rnorm(n) X3 <- X1 + rnorm(n) X4 <- X3 + L1 + rnorm(n) d <- data.frame(p1_X1 = X1, p1_X2 = X2, p2_X3 = X3, p2_X4 = X4) # use FCI algorithm to recover PAG res <- tfci(d, order = c("p1", "p2"), test = corTest) # plot plot(res)
Plot temporal partially directed acyclic graph (TPDAG)
## S3 method for class 'tpdag' plot(x, ...)
## S3 method for class 'tpdag' plot(x, ...)
x |
tpdag (temporal partially directed acyclic graph) object
to be plotted (as outputted from |
... |
Further plotting arguments passed along to |
No return value, the function is called for its side-effects (plotting).
Plot temporal skeleton
## S3 method for class 'tskeleton' plot(x, ...)
## S3 method for class 'tskeleton' plot(x, ...)
x |
tskeleton (temporal skeleton) object to be plotted
(as outputted from |
... |
Further plotting arguments passed along to |
No return value, the function is called for its side-effects (plotting).
Plots tpdag, tskeleton and tamat objects.
plotTempoMech( x, addTimeAxis = TRUE, addPsi = TRUE, varLabels = NULL, periodLabels = NULL, colors = NULL, ... )
plotTempoMech( x, addTimeAxis = TRUE, addPsi = TRUE, varLabels = NULL, periodLabels = NULL, colors = NULL, ... )
x |
The tpdag/tskeleton or tamat to plot. |
addTimeAxis |
Logical indicating whether a time axis should be added to the plot. |
addPsi |
Logical indicating whether the sparsity level should be added to the plot. |
varLabels |
A named list of variable labels. |
periodLabels |
A character vector with labels for periods. |
colors |
A character vector with colors to use for marking periods. Should have at least as many elements as the numbers of periods. |
... |
Additional arguments passed to |
No return value, the function is called for its side-effects (plotting).
Computes precision (aka positive predictive value) from a confusion matrix, see confusion. Precision is defined as TP/(TP + FP), where TP are true positives and FP are false positives. If TP + FP = 0, 0 is returned.
precision(confusion)
precision(confusion)
confusion |
Confusion matrix as obtained from confusion |
A numeric in [0,1].
Convert a matrix of probabilities into an adjacency matrix
probmat2amat( probmat, threshold, method = "cutoff", keep_vnames = TRUE, graph_criterion = "pdag", deletesym = FALSE )
probmat2amat( probmat, threshold, method = "cutoff", keep_vnames = TRUE, graph_criterion = "pdag", deletesym = FALSE )
probmat |
Square matrix of probabilities. |
threshold |
Value between 0 and 1. Any probabilities lower than this value will be set to 0 (no arrowhead). |
method |
Either |
keep_vnames |
If |
graph_criterion |
Which criterion to check if the output graph fulfills for the bpco
method. Should be one of |
deletesym |
If |
Two methods for converting the probability matrix into an adjacency
matrix are implemented. First, the cutoff-method (method = "cutoff"
) simply
uses a threshold value and sets all values below that to zero in the outputted
adjacency matrix. No checks are performed to ensure that the resulting
matrix is a proper dag/pdag/cpdag adjacency matrix. Second, the backwards
PC orientation method (method = "bpco"
) first uses a cutoff, and then
sets further elements to zero until the resulting matrix can be converted into
a proper adjacency matrix (using the graph criterion specified in the
graph_criterion
argument) by applying the PC algorithm orientation rules.
See Petersen et al. 2022 for further details.
A square matrix of probabilities (all entries in [0,1]).
Petersen, Anne Helby, et al. "Causal discovery for observational sciences using supervised machine learning." arXiv preprint arXiv:2202.12813 (2022).
#Make random probability matrix that can be #converted into adjancency matrix pmat <- matrix(runif(25, 0, 1), 5, 5) diag(pmat) <- 0 #Convert to adjacency matrix using cutoff-method (threshold = 0.5) probmat2amat(pmat, threshold = 0.5) #Convert to adjacency matrix using BPCO-method (threshold = 0.5) probmat2amat(pmat, threshold = 0.5, method = "bpco")
#Make random probability matrix that can be #converted into adjancency matrix pmat <- matrix(runif(25, 0, 1), 5, 5) diag(pmat) <- 0 #Convert to adjacency matrix using cutoff-method (threshold = 0.5) probmat2amat(pmat, threshold = 0.5) #Convert to adjacency matrix using BPCO-method (threshold = 0.5) probmat2amat(pmat, threshold = 0.5, method = "bpco")
Computes recall from a confusion matrix, see confusion. Recall is defined as TP/(TP + FN), where TP are true positives and FN are false negatives. If TP + FN = 0, 0 is returned.
recall(confusion)
recall(confusion)
confusion |
Confusion matrix as obtained from confusion |
A numeric in [0,1].
We test whether x
and y
are associated, given
S
using a generalized linear model.
regTest(x, y, S, suffStat)
regTest(x, y, S, suffStat)
x |
Index of x variable |
y |
Index of y variable |
S |
Index of S variable(s), possibly NULL |
suffStat |
Sufficient statistic; list with data, binary variables and order. |
All included variables should be either numeric or binary. If
y
is binary, a logistic regression model is fitted. If y
is numeric,
a linear regression model is fitted. x
and S
are included as
explanatory variables. Any numeric variables among x
and S
are
modeled with spline expansions (natural splines, 3 df). This model is tested
against a numeric where x
(including a possible spline expansion) has
been left out using a likelihood ratio test.
The model is fitted in both directions (interchanging the roles
of x
and y
). The final p-value is the maximum of the two
obtained p-values.
A numeric, which is the p-value of the test.
Computes the structural hamming distance between two adjacency matrices. This implementation
is a modification of the shd
function from the pcalg package, but here we
avoid working on the heavy graphNEL
objects for representing graphs that are used in the
pcalg package.
shd(est_amat, true_amat)
shd(est_amat, true_amat)
est_amat |
Estimated adjacency matrix |
true_amat |
True adjacency matrix |
Note that the function is symmetric in the two inputted adjacency matrices.
A numeric (a non-negative integer).
Simulates a random directed acyclic graph adjacency (DAG) matrix with the provided edge sparsity. The edge sparsity is the percentage of edges that are absent, relative to a fully connected DAG.
simDAG(p, sparsity = NULL, sparsityLim = c(0, 0.8), permute = TRUE)
simDAG(p, sparsity = NULL, sparsityLim = c(0, 0.8), permute = TRUE)
p |
The number of nodes. |
sparsity |
If |
sparsityLim |
A vector of two numerics, both must be in [0,1]. |
permute |
If |
An adjacency matrix.
# Simulate a DAG adjacency matrix with 5 nodes simDAG(5)
# Simulate a DAG adjacency matrix with 5 nodes simDAG(5)
Simulates a jointly Gaussian dataset given a DAG adjacency matrix ("from-to" encoding, see amat for details). The data is simulated using linear structural equations and the parameters (residual standard deviations and regression coefficients) are sampled from chosen intervals.
simGausFromDAG( amat, n, regparLim = c(0.5, 2), resSDLim = c(0.1, 1), pnegRegpar = 0.4, standardize = FALSE )
simGausFromDAG( amat, n, regparLim = c(0.5, 2), resSDLim = c(0.1, 1), pnegRegpar = 0.4, standardize = FALSE )
amat |
An adjacency matrix. |
n |
The number of observations that should be simulated. |
regparLim |
The interval from which regression parameters are sampled. |
resSDLim |
The interval from which residual standard deviations are sampled. |
pnegRegpar |
The probability of sampling a negative regression parameter. |
standardize |
If |
A variable is simulated as
where are the parents of
in the DAG.
The residual,
, is drawn from a normal distribution.
A data.frame of identically distributed simulated observations.
# Simulate DAG adjacency matrix with 6 nodes ex_dag <- simDAG(6) # Simulate Gaussian data (100 iid observations) ex_data <- simGausFromDAG(ex_dag, n = 100)
# Simulate DAG adjacency matrix with 6 nodes ex_dag <- simDAG(6) # Simulate Gaussian data (100 iid observations) ex_data <- simGausFromDAG(ex_dag, n = 100)
Computes specificity from a confusion matrix, see confusion. Specificity is defined as TN/(TN + FP), where TN are true negatives and FP are false positives. If TN + FP = 0, 0 is returned.
specificity(confusion)
specificity(confusion)
confusion |
Confusion matrix as obtained from confusion |
A numeric in [0,1].
Make a temporal adjacency matrix
tamat(amat, order, type = NULL)
tamat(amat, order, type = NULL)
amat |
Adjacency matrix. Row names and column names should be identical and be the names of the variables/nodes. Variable names should be prefixed with their period, e.g. "child_x" for variable "x" at period "child" |
order |
A character vector with the periods in their order. |
type |
The type of adjancency matrix, must be one of |
A tamat
object, which is a matrix with a "order"
attribute(a character vector listing the temporal order of the variables
in the adjacency matrix).
Use a modification of the FCI algorithm that makes use of background knowledge in the format of a partial ordering. This may for instance come about when variables can be assigned to distinct tiers or periods (i.e., a temporal ordering).
tfci( data = NULL, order, sparsity = 10^(-1), test = regTest, suffStat = NULL, method = "stable.fast", methodNA = "none", methodOri = "conservative", varnames = NULL, ... )
tfci( data = NULL, order, sparsity = 10^(-1), test = regTest, suffStat = NULL, method = "stable.fast", methodNA = "none", methodOri = "conservative", varnames = NULL, ... )
data |
A data.frame with data. All variables should be assigned to exactly one period by prefixing them with the period name (see example below). |
order |
A character vector with period-prefixes in their temporal order (see example below). |
sparsity |
The sparsity level to be used for independence testing (i.e. significance level threshold to use for each test). |
test |
A procedure for testing conditional independence.
The default, |
suffStat |
Sufficient statistic. If this argument is supplied, the sufficient statistic is not computed from the inputted data. The format and contents of the sufficient statistic depends on which test is being used. |
method |
Which method to use for skeleton construction, must be
|
methodNA |
Method for handling missing information ( |
methodOri |
Method for handling conflicting separating sets when orienting
edges, must be one of |
varnames |
A character vector of variable names. It only needs to be supplied
if the |
... |
Further optional arguments which are passed to
|
The temporal/tiered background information enters several places in the TFCI algorithm: 1) In the skeleton construction phase, when looking for separating sets Z between two variables X and Y, Z is not allowed to contain variables that are strictly after both X and Y in the temporal order. 2) This also applies to the subsequent phase where the algorithm searches for possible D-SEP sets. 3) Prior to other orientation steps, any cross-tier edges get an arrowhead placed at their latest node.
After this, the usual FCI orientation rules are applied, see udag2pag for details.
The default output is a tpag
object. This is an
S3 object, i.e., a list, with entries: $tamat
(the estimated adjacency
matrix), $order
(character vector with the order, as inputted to
this function), $psi
(the significance level used for testing), and
$ntests
(the number of tests conducted).
The adjacency matrix A has four possible entry values: 0 (no edge), 1 (circle),
2 (arrowhead), 3 (tail). It is encoded as a "to-from" adjacency matrix, which means
that e.g. A(i,j) = 1 places a circle in the directed from j to i. For example, if
A(i,j) = 1 and A(j,i) = 2, we have that i o-> j. Similarly, A(i,j) = 2 and A(j,i) = 3
mean that i <- j. Note that this is a transposed version of the adjacency
matrix outputted for fciAlgo
objects from the pcalg
package, which
is "to-from". But it is consistent with the "from-to" adjacency matrices used
for pcAlgo
objects from the same package.
Anne Helby Petersen, Qixiang Chen, and Daniel Malinsky.
# simulate linear Gaussian data w unobserved variable L1 set.seed(123) n <- 100 L1 <- rnorm(n) X1 <- rnorm(n) X2 <- L1 + X1 + rnorm(n) X3 <- X1 + rnorm(n) X4 <- X3 + L1 + rnorm(n) d <- data.frame(p1_X1 = X1, p1_X2 = X2, p2_X3 = X3, p2_X4 = X4) # use tfci algorithm to recover tpag (conservative edge orientation) tfci(d, test = corTest, order = c("p1", "p2")) # use tfci with standard (non-conservative) method for edge orientation tfci(d, test = corTest, order = c("p1", "p2"), methodOri = "standard")
# simulate linear Gaussian data w unobserved variable L1 set.seed(123) n <- 100 L1 <- rnorm(n) X1 <- rnorm(n) X2 <- L1 + X1 + rnorm(n) X3 <- X1 + rnorm(n) X4 <- X3 + L1 + rnorm(n) d <- data.frame(p1_X1 = X1, p1_X2 = X2, p2_X3 = X3, p2_X4 = X4) # use tfci algorithm to recover tpag (conservative edge orientation) tfci(d, test = corTest, order = c("p1", "p2")) # use tfci with standard (non-conservative) method for edge orientation tfci(d, test = corTest, order = c("p1", "p2"), methodOri = "standard")
Perform causal discovery using the temporal PC algorithm (TPC)
tpc( data = NULL, order, sparsity = 10^(-1), test = regTest, suffStat = NULL, method = "stable.fast", methodNA = "none", methodOri = "conservative", output = "tpdag", varnames = NULL, ... )
tpc( data = NULL, order, sparsity = 10^(-1), test = regTest, suffStat = NULL, method = "stable.fast", methodNA = "none", methodOri = "conservative", output = "tpdag", varnames = NULL, ... )
data |
A data.frame with data. All variables should be assigned to exactly one period by prefixing them with the period name (see example below). |
order |
A character vector with period-prefixes in their temporal order (see example below). |
sparsity |
The sparsity level to be used for independence testing (i.e. significance level threshold to use for each test). |
test |
A procedure for testing conditional independence.
The default, |
suffStat |
Sufficient statistic. If this argument is supplied, the sufficient statistic is not computed from the inputted data. The format and contents of the sufficient statistic depends on which test is being used. |
method |
Which method to use for skeleton construction, must be
|
methodNA |
Method for handling missing information ( |
methodOri |
Method for handling conflicting separating sets when orienting edges. Currently only the conservative method is available. |
output |
One of |
varnames |
A character vector of variable names. It only needs to be supplied
if the |
... |
Further optional arguments which are passed to
|
Note that all independence test procedures implemented
in the pcalg
package may be used, see pc
.
The methods for handling missing information require that the data
,
rather than the suffStat
argument is used for inputting data; the latter
assumes no missing information and hence always sets methodNA = "none"
.
If the test is corTest
, test-wise deletion is performed when computing the
sufficient statistic (correlation matrix) (so for each pair of variables, only
complete cases are used). If the test is regTest
, test-wise deletion
is performed for each conditional independence test instead.
A tpdag
or tskeleton
object. Both return types are
S3 objects, i.e., lists with entries: $tamat
(the estimated adjacency
matrix), $order
(character vector with the order, as inputted to
this function), $psi
(the significance level used for testing), and
$ntests
(the number of tests conducted).
#TPC on included example data, use sparsity psi = 0.01, default test (regression-based #information loss): data(tpcExample) tpc(tpcExample, order = c("child", "youth", "oldage"), sparsity = 0.01) #TPC on included example data, use sparsity psi = 0.01, use test for vanishing partial # correlations: data(tpcExample) tpc(tpcExample, order = c("child", "youth", "oldage"), sparsity = 0.01, test = corTest) #TPC on another simulated data set #Simulate data set.seed(123) n <- 500 child_x <- rnorm(n)^2 child_y <- 0.5*child_x + rnorm(n) child_z <- sample(c(0,1), n, replace = TRUE, prob = c(0.3, 0.7)) adult_x <- child_x + rnorm(n) adult_z <- as.numeric(child_z + rnorm(n) > 0) adult_w <- 2*adult_z + rnorm(n) adult_y <- 2*sqrt(child_x) + adult_w^2 + rnorm(n) simdata <- data.frame(child_x, child_y, child_z, adult_x, adult_z, adult_w, adult_y) #Define order simorder <- c("child", "adult") #Perform TPC with sparsity psi = 0.001 results <- tpc(simdata, order = simorder, sparsity = 10^(-3))
#TPC on included example data, use sparsity psi = 0.01, default test (regression-based #information loss): data(tpcExample) tpc(tpcExample, order = c("child", "youth", "oldage"), sparsity = 0.01) #TPC on included example data, use sparsity psi = 0.01, use test for vanishing partial # correlations: data(tpcExample) tpc(tpcExample, order = c("child", "youth", "oldage"), sparsity = 0.01, test = corTest) #TPC on another simulated data set #Simulate data set.seed(123) n <- 500 child_x <- rnorm(n)^2 child_y <- 0.5*child_x + rnorm(n) child_z <- sample(c(0,1), n, replace = TRUE, prob = c(0.3, 0.7)) adult_x <- child_x + rnorm(n) adult_z <- as.numeric(child_z + rnorm(n) > 0) adult_w <- 2*adult_z + rnorm(n) adult_y <- 2*sqrt(child_x) + adult_w^2 + rnorm(n) simdata <- data.frame(child_x, child_y, child_z, adult_x, adult_z, adult_w, adult_y) #Define order simorder <- c("child", "adult") #Perform TPC with sparsity psi = 0.001 results <- tpc(simdata, order = simorder, sparsity = 10^(-3))
A small simulated data example intended to showcase the TPC algorithm. Note that the variable name prefixes defines with period they are related to ("child", "youth" or "oldage").
tpcExample
tpcExample
A data.frame with 200 rows and 6 variables.
Structural equation: with
Structural equation: with
Structural equation: with
Structural equation: with
Structural equation: with
Structural equation: with
Petersen, AH; Osler, M and Ekstrøm, CT (2021): Data-Driven Model Building for Life-Course Epidemiology, American Journal of Epidemiology.
data(tpcExample)
data(tpcExample)
Generates a plot of a tamat, tpdag or tpag object by use of Latex tikz. Note that a working Latex installation is required. Note also that this function is slower than typical R graphics options and may take some time to terminate.
tplot( x, filename = "causaldisco_tplot_temp", keepfiles = FALSE, bendedges = TRUE, ... )
tplot( x, filename = "causaldisco_tplot_temp", keepfiles = FALSE, bendedges = TRUE, ... )
x |
A tamat, tpdag, or tpag object as obtained from |
filename |
Name of files that will be used internally during the function's
runtime. This filename will be appended with both .rmd and .pdf.
Note that unless |
keepfiles |
If |
bendedges |
If |
... |
Additional argument passed to |
The function renders Latex code using rmarkdown, which relies on a working installation of Latex. Afterwards, the resulting pdf graphic is loaded into R and displayed in a browser. If working in Rstudio it may be opened in the built-in viewer, depending on Rstudio global settings.