Title: | Tools for Principal Component Analysis-Based Data Structure Comparisons |
---|---|
Description: | A suite of non-parametric, visual tools for assessing differences in data structures for two datasets that contain different observations of the same variables. These tools are all based on Principal Component Analysis (PCA) and thus effectively address differences in the structures of the covariance matrices of the two datasets. The PCADSC tools consist of easy-to-use, intuitive plots that each focus on different aspects of the PCA decompositions. The cumulative eigenvalue (CE) plot describes differences in the variance components (eigenvalues) of the deconstructed covariance matrices. The angle plot presents the information loss when moving from the PCA decomposition of one dataset to the PCA decomposition of the other. The chroma plot describes the loading patterns of the two datasets, thereby presenting the relative weighting and importance of the variables from the original dataset. |
Authors: | Anne Helby Petersen [aut, cre], Bo Markussen [aut] |
Maintainer: | Anne Helby Petersen <[email protected]> |
License: | GPL-2 |
Version: | 0.9.0 |
Built: | 2025-01-25 03:03:19 UTC |
Source: | https://github.com/annennenne/pcadsc |
Produce an angle plot from a full or partial PCADSC
object, as obtained
from a call to PCADSC
. In either case, this PCADSC
object must have a
non-NULL
anleInfo
slot (see examples). The angle plot compares the eigenvalue-
and loading patterns from PCA performed on two datasets that consist of different observations
of the same variables.
anglePlot(x)
anglePlot(x)
x |
A |
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #make a full PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group") #make a partial PCADSC object from iris and fill out angleInfo in the next call irisPCADSC2 <- PCADSC(iris, "group", doAngle = FALSE) irisPCADSC2 <- doAngle(irisPCADSC2) #make an angle plot anglePlot(irisPCADSC) anglePlot(irisPCADSC2) ## End(Not run) #Only do angle information for a faster run-time irisPCADSC_fast <- PCADSC(iris, "group", doCE = FALSE, doChroma = FALSE) anglePlot(irisPCADSC_fast)
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #make a full PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group") #make a partial PCADSC object from iris and fill out angleInfo in the next call irisPCADSC2 <- PCADSC(iris, "group", doAngle = FALSE) irisPCADSC2 <- doAngle(irisPCADSC2) #make an angle plot anglePlot(irisPCADSC) anglePlot(irisPCADSC2) ## End(Not run) #Only do angle information for a faster run-time irisPCADSC_fast <- PCADSC(iris, "group", doCE = FALSE, doChroma = FALSE) anglePlot(irisPCADSC_fast)
Produce a cumulative eigenvalue (CE) plot from a full or partial PCADSC
object,
as obtained from a call to PCADSC
. In either case, this PCADSC
object must have a
non-NULL
CEInfo
slot (see examples). The CE plot compares the eigenvalues obtained
from PCA performed separately and jointly on two datasets that consist of different observations
of the same variables.
CEPlot(x, nDraw = NULL)
CEPlot(x, nDraw = NULL)
x |
x A |
nDraw |
A positive integer. The number of simulated cumulative eigenvalue curves that should be added to the plot. |
In the x-coordinates, cumulative differences in eigenvalues are shown, while the y-coordinates are the cumulative sum of the joint eigenvalues. The plot is annotated with Kolmogorov-Smirnov and Cramer-von Mises tests evaluated by permutation tests, testing the null hypothesis of no difference in eigenvalues. The plot also features a number of cumulative simulated cumulative eigenvalue curves as dashed lines. Moreover, a shaded area presents pointwise 95 % confidence bands for the cumulative difference, also obtained using the permutation test.
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #make a PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group") #make a partial PCADSC object from iris and fill out CEInfo in the next call irisPCADSC2 <- PCADSC(iris, "group", doCE = FALSE) irisPCADSC2 <- doCE(irisPCADSC2) #make a CE plot CEPlot(irisPCADSC) CEPlot(irisPCADSC2) ## End(Not run) #Only do CE information and use less resamplings for a faster runtime irisPCADSC_fast <- PCADSC(iris, "group", doAngle = FALSE, doChroma = FALSE, B = 1000) CEPlot(irisPCADSC_fast)
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #make a PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group") #make a partial PCADSC object from iris and fill out CEInfo in the next call irisPCADSC2 <- PCADSC(iris, "group", doCE = FALSE) irisPCADSC2 <- doCE(irisPCADSC2) #make a CE plot CEPlot(irisPCADSC) CEPlot(irisPCADSC2) ## End(Not run) #Only do CE information and use less resamplings for a faster runtime irisPCADSC_fast <- PCADSC(iris, "group", doAngle = FALSE, doChroma = FALSE, B = 1000) CEPlot(irisPCADSC_fast)
Produce a chroma plot from a full or partial PCADSC
object, as obtained
from a call to PCADSC
. In either case, this PCADSC
object must have a
non-NULL
chromaInfo
slot (see examples). The chroma plot compares the loading
patterns from PCA conducted on two datasets consisting of different observations of the
same variables.
chromaPlot( x, varLabels = NULL, cvCO = 1, splitLabels = NULL, varAnnotation = "cum", useComps = NULL )
chromaPlot( x, varLabels = NULL, cvCO = 1, splitLabels = NULL, varAnnotation = "cum", useComps = NULL )
x |
Either a |
varLabels |
A vector of character string labels for the variables used in
|
cvCO |
A numeric in the interval |
splitLabels |
Labels for the two categories of the splitting variable used
to create the |
varAnnotation |
If |
useComps |
A vector of integers with the indexes of the principal component that should be included in the plot. |
The plot consists of one display for each of the two datasets. The two displays both
consist of a number of vertical bars. Each vertical bar represents a principal component and the
width of each colored section (chroma) within the bar corresponds to the normalized PCA loading
vector of that component. The bars can be annotated with the (cumulative) variance contributions
of the components (see varAnnotation
).
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #make a PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group") #make a partial PCADSC object from iris and fill out chromaInfo in the next call irisPCADSC2 <- PCADSC(iris, "group", doChroma = FALSE) irisPCADSC2 <- doChroma(irisPCADSC2) #make a chroma plot chromaPlot(irisPCADSC) chromaPlot(irisPCADSC) #Change the labels of the splitting variable chromaPlot(irisPCADSC, splitLabels = list("non-setosa" = "Not Setosa", "setosa" = "Setosa")) #Only plot components 1 and 4 and remove annotated variances chromaPlot(irisPCADSC, useComps = c(1,4), varAnnotation = "no") #Only plot the first components responsible for explaining 80 percent variance chromaPlot(irisPCADSC, cvCO = 0.8) #Change variable labels chromaPlot(irisPCADSC, varLabels = c("Sepal length", "Sepal width", "Petal length", "Petal width")) ## End(Not run) #Only do chroma information in order to get a faster runtime: irisPCADSC_fast <- PCADSC(iris, "group", doCE = FALSE, doAngle = FALSE) chromaPlot(irisPCADSC_fast)
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #make a PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group") #make a partial PCADSC object from iris and fill out chromaInfo in the next call irisPCADSC2 <- PCADSC(iris, "group", doChroma = FALSE) irisPCADSC2 <- doChroma(irisPCADSC2) #make a chroma plot chromaPlot(irisPCADSC) chromaPlot(irisPCADSC) #Change the labels of the splitting variable chromaPlot(irisPCADSC, splitLabels = list("non-setosa" = "Not Setosa", "setosa" = "Setosa")) #Only plot components 1 and 4 and remove annotated variances chromaPlot(irisPCADSC, useComps = c(1,4), varAnnotation = "no") #Only plot the first components responsible for explaining 80 percent variance chromaPlot(irisPCADSC, cvCO = 0.8) #Change variable labels chromaPlot(irisPCADSC, varLabels = c("Sepal length", "Sepal width", "Petal length", "Petal width")) ## End(Not run) #Only do chroma information in order to get a faster runtime: irisPCADSC_fast <- PCADSC(iris, "group", doCE = FALSE, doAngle = FALSE) chromaPlot(irisPCADSC_fast)
Computes the information that is needed in order to make an anglePlot
from a PCADSC
or pcaRes
object. Typically, this function is called on a partial
PCADSC
object in order to add angleInfo
(see examples).
doAngle(x, ...)
doAngle(x, ...)
x |
Either a |
... |
If |
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #make a partial PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group", doAngle = FALSE) #No angleInfo available irisPCADSC$angleInfo #Add and show angleInfo irisPCADSC <- doAngle(irisPCADSC) irisPCADSC$angleInfo ## End(Not run) #Make a partial PCADSC object and only add angle information for a #faster runtime irisPCADSC_fast <- PCADSC(iris, "group", doAngle = FALSE, doChroma = FALSE, doCE = FALSE) irisPCADSC_fast <- doAngle(irisPCADSC_fast, B = 100) irisPCADSC_fast$angleInfo
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #make a partial PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group", doAngle = FALSE) #No angleInfo available irisPCADSC$angleInfo #Add and show angleInfo irisPCADSC <- doAngle(irisPCADSC) irisPCADSC$angleInfo ## End(Not run) #Make a partial PCADSC object and only add angle information for a #faster runtime irisPCADSC_fast <- PCADSC(iris, "group", doAngle = FALSE, doChroma = FALSE, doCE = FALSE) irisPCADSC_fast <- doAngle(irisPCADSC_fast, B = 100) irisPCADSC_fast$angleInfo
Computes the information that is needed in order to make a CEPlot
from a PCADSC
or pcaRes
object. Typically, this function is called on a partial
PCADSC
object in order to add CEInfo
(see examples).
doCE(x, ...)
doCE(x, ...)
x |
Either a |
... |
If |
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #make a partial PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group", doCE = FALSE) #No CEInfo available irisPCADSC$CEInfo #Add and show CEInfo irisPCADSC <- doCE(irisPCADSC) irisPCADSC$CEInfo ## End(Not run) #Make a partial PCADSC object and only add CE information with no #bootstrapping (and thus no test) irisPCADSC_fast <- PCADSC(iris, "group", doAngle = FALSE, doChroma = FALSE, doCE = FALSE) irisPCADSC_fast <- doCE(irisPCADSC_fast, B = 100) irisPCADSC_fast$CEInfo
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #make a partial PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group", doCE = FALSE) #No CEInfo available irisPCADSC$CEInfo #Add and show CEInfo irisPCADSC <- doCE(irisPCADSC) irisPCADSC$CEInfo ## End(Not run) #Make a partial PCADSC object and only add CE information with no #bootstrapping (and thus no test) irisPCADSC_fast <- PCADSC(iris, "group", doAngle = FALSE, doChroma = FALSE, doCE = FALSE) irisPCADSC_fast <- doCE(irisPCADSC_fast, B = 100) irisPCADSC_fast$CEInfo
Computes the information that is needed in order to make a chromaPlot
from a PCADSC
or pcaRes
object. Typically, this function is called on a partial
PCADSC
object in order to add chromaInfo
(see examples).
doChroma(x)
doChroma(x)
x |
Either a |
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #make a partial PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group", doChroma = FALSE) #No chromaInfo available irisPCADSC$chromaInfo #Add and show chromaInfo irisPCADSC <- doChroma(irisPCADSC) irisPCADSC$chromaInfo ## End(Not run) #Make a partial PCADSC object and only add chroma information for a #faster runtime irisPCADSC_fast <- PCADSC(iris, "group", doAngle = FALSE, doChroma = FALSE, doCE = FALSE) irisPCADSC_fast <- doChroma(irisPCADSC_fast) irisPCADSC_fast$chromaInfo
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #make a partial PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group", doChroma = FALSE) #No chromaInfo available irisPCADSC$chromaInfo #Add and show chromaInfo irisPCADSC <- doChroma(irisPCADSC) irisPCADSC$chromaInfo ## End(Not run) #Make a partial PCADSC object and only add chroma information for a #faster runtime irisPCADSC_fast <- PCADSC(iris, "group", doAngle = FALSE, doChroma = FALSE, doCE = FALSE) irisPCADSC_fast <- doChroma(irisPCADSC_fast) irisPCADSC_fast$chromaInfo
Principal Component Analysis-based Data Structure Comparison tools that
prepare a dataset for various diagnostic plots for comparing data structures. More
specifically, PCADSC
performs PCA on two subsets of a dataset in order to
compare the structures of these datasets, e.g. to assess whether they can be analyzed pooled
or not. The results of the PCAs are then manipulated in various
ways and stored for easy plotting using the three PCADSC plotting tools, the CEPlot
,
the anglePlot
and the chromaPlot
.
PCADSC( data, splitBy, vars = NULL, doCE = TRUE, doAngle = TRUE, doChroma = TRUE, B = 10000, use = "complete.obs" )
PCADSC( data, splitBy, vars = NULL, doCE = TRUE, doAngle = TRUE, doChroma = TRUE, B = 10000, use = "complete.obs" )
data |
A dataset, either a |
splitBy |
The name of a grouping variable with two levels defining the two groups within the dataset whose data structures we wish to compare. |
vars |
The variable names in |
doCE |
Logical. Should the cumulative eigenvalue plot information be computed? |
doAngle |
Logical. Should the angle plot information be computed? |
doChroma |
Logical. Should the chroma plot information be computed? |
B |
A positive integer. The number of resampling steps performed in the cumulative eigenvalue step, if relevant. |
use |
A character string specifying what observations should be used in the presence
of missing information. Defaults to |
PCADSC presents a suite of non-parametric, visual tools for comparing the structures of
two subsets of a dataset. These tools are all based on PCA (principal component analysis) and
thus they can be interpreted as comparisons of the covariance matrices of the two (sub)datasets.
PCADSC
performs PCA using singular value decomposition for increased numerical precision.
Before performing PCA on the full dataset and the two subsets, all variables within each such
dataset are standardized.
An object of class PCADSC
, which is a named list with the following entries:
The results of the PCAs performed on the first subset, the second subset and the full subset and also information about the data splitting.
The information needed for making a cumulative eigenvalue plot
(see CEPlot
).
The information needed for making an angle plot
(see anglePlot
).
The information needed for making a chroma plot
(see chromaPlot
).
The original (full) dataset.
The name of the variable that splits the dataset in two.
The names of the variables in the dataset that should be used for PCA.
The number of resamplings performed for the CEInfo
.
doCE
, doAngle
, doChroma
,
CEPlot
, anglePlot
, chromaPlot
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #Make a full PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group") #The three plotting functions can now be called on irisPCADSC: CEPlot(irisPCADSC) anglePlot(irisPCADSC) chromaPlot(irisPCADSC) #Make a partial PCADSC object with no angle plot information and add #angle plot information afterwards: irisPCADSC2 <- PCADSC(iris, "group", doAngle = FALSE) irisPCADSC2 <- doAngle(irisPCADSC) ## End(Not run) #Make a partial PCADSC obejct with no plotting (angle/CE/chroma) #information: irisPCADSC_minimal <- PCADSC(iris, "group", doAngle = FALSE, doCE = FALSE, doChroma = FALSE)
#load iris data data(iris) #Define grouping variable, grouping the observations by whether their species is #Setosa or not iris$group <- "setosa" iris$group[iris$Species != "setosa"] <- "non-setosa" iris$Species <- NULL ## Not run: #Make a full PCADSC object, splitting the data by "group" irisPCADSC <- PCADSC(iris, "group") #The three plotting functions can now be called on irisPCADSC: CEPlot(irisPCADSC) anglePlot(irisPCADSC) chromaPlot(irisPCADSC) #Make a partial PCADSC object with no angle plot information and add #angle plot information afterwards: irisPCADSC2 <- PCADSC(iris, "group", doAngle = FALSE) irisPCADSC2 <- doAngle(irisPCADSC) ## End(Not run) #Make a partial PCADSC obejct with no plotting (angle/CE/chroma) #information: irisPCADSC_minimal <- PCADSC(iris, "group", doAngle = FALSE, doCE = FALSE, doChroma = FALSE)
Quantile-quantile plot for the observed anglePlot p-values against the null hypothesis of equal structure. Size of points display the length of the arrows in the anglePlot. Also shown is pointwise 95pct confidence interval under the null together with 20 random samples under the null. Finally, the p-value for one-sided Kolmogorov-Smirnov test on the logaritmic scale is given.
QQanglePlot(x)
QQanglePlot(x)
x |
A |