R/processStudy.R
computePCARefSample.RdThis function generates a PCA using the know reference profiles. Them, it projects the specified profile onto the PCA axes.
computePCARefSample(
gdsProfile,
currentProfile,
studyIDRef = "Ref.1KG",
np = 1L,
algorithm = c("exact", "randomized"),
eigenCount = 32L,
missingRate = NaN,
verbose = FALSE
)an object of class gds.class, an opened Profile GDS file.
a single character string representing
the profile identifier.
a single character string representing the
study identifier.
a single positive integer representing the number of CPU
that will be used. Default: 1L.
a character string representing the algorithm used
to calculate the PCA. The 2 choices are "exact" (traditional exact
calculation) and "randomized" (fast PCA with randomized algorithm
introduced in Galinsky et al. 2016). Default: "exact".
a single integer indicating the number of
eigenvectors that will be in the output of the snpgdsPCA
function; if 'eigen.cnt' <= 0, then all eigenvectors are returned.
Default: 32L.
a numeric value representing the threshold
missing rate at with the SNVs are discarded; the SNVs are retained in the
snpgdsPCA
with "<= missingRate" only; if NaN, no missing threshold.
Default: NaN.
a logical indicating if messages should be printed
to show how the different steps in the function. Default: FALSE.
a list containing 3 entries:
sample.ida character string representing the unique
identifier of the analyzed profile.
eigenvector.refa matrix of numeric
representing the eigenvectors of the reference profiles.
eigenvectora matrix of numeric representing
the eigenvectors of the analyzed profile.
Galinsky KJ, Bhatia G, Loh PR, Georgiev S, Mukherjee S, Patterson NJ, Price AL. Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia. Am J Hum Genet. 2016 Mar 3;98(3):456-72. doi: 10.1016/j.ajhg.2015.12.022. Epub 2016 Feb 25.
## Required library
library(gdsfmt)
## Path to the demo Profile GDS file is located in this package
dataDir <- system.file("extdata/demoAncestryCall", package="RAIDS")
## Open the Profile GDS file
gdsProfile <- snpgdsOpen(file.path(dataDir, "ex1.gds"))
## Project a profile onto a PCA generated using reference profiles
## The reference profiles come from 1KG
resPCA <- computePCARefSample(gdsProfile=gdsProfile,
currentProfile=c("ex1"), studyIDRef="Ref.1KG", np=1L, verbose=FALSE)
resPCA$sample.id
#> [1] "ex1"
resPCA$eigenvector
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> ex1 -0.03917926 0.0290796 -0.1861643 -0.05760641 -0.01053691 -0.08274071
#> [,7] [,8] [,9] [,10] [,11] [,12]
#> ex1 0.0777924 -0.2437205 -0.008855972 0.2156765 -0.1139829 -0.08007963
#> [,13] [,14] [,15] [,16] [,17] [,18] [,19]
#> ex1 -0.1452985 0.233155 0.5753156 -0.1938115 0.504467 -0.8293339 0.5437238
#> [,20] [,21] [,22] [,23] [,24] [,25] [,26]
#> ex1 -0.1480745 0.03492421 -0.2146903 0.1610501 -0.3487348 -0.2806519 0.4095053
#> [,27] [,28] [,29] [,30] [,31] [,32]
#> ex1 -0.1480394 -1.001517 0.2316207 -0.3235428 -0.3843232 -0.3291498
## Close the GDS files (important)
closefn.gds(gdsProfile)