R/processStudy.R
pruningSample.Rd
This function computes the list of pruned SNVs for a
specific profile. When
a group of SNVs are in linkage disequilibrium, only one SNV from that group
is retained. The linkage disequilibrium is calculated with the
snpgdsLDpruning
() function. The initial list of
SNVs that are passed to the snpgdsLDpruning
()
function can be specified by the user.
pruningSample(
gdsReference,
method = c("corr", "r", "dprime", "composite"),
currentProfile,
studyID,
listSNP = NULL,
slideWindowMaxBP = 500000L,
thresholdLD = sqrt(0.1),
np = 1L,
verbose = FALSE,
chr = NULL,
superPopMinAF = NULL,
keepPrunedGDS = TRUE,
pathProfileGDS = NULL,
keepFile = FALSE,
pathPrunedGDS = ".",
outPrefix = "pruned"
)
an object of class gds.class (a GDS file), the 1 KG GDS file (reference data set).
a character
string that represents the method that will
be used to calculate the linkage disequilibrium in the
snpgdsLDpruning
() function. The 4 possible values
are: "corr", "r", "dprime" and "composite". Default: "corr"
.
a character
string
corresponding to the profile identifier used in LD pruning done by the
snpgdsLDpruning
() function. A Profile GDS file
corresponding to the profile identifier must exist and be located in the
pathProfileGDS
directory.
a character
string corresponding to the study
identifier used in the snpgdsLDpruning
function.
The study identifier must be present in the Profile GDS file.
a vector
of SNVs identifiers specifying selected to
be passed the the pruning function;
if NULL
, all SNVs are used in the
snpgdsLDpruning
function. Default: NULL
.
a single positive integer
that represents
the maximum basepairs (bp) in the sliding window. This parameter is used
for the LD pruning done in the snpgdsLDpruning
function.
Default: 500000L
.
a single numeric
value that represents the LD
threshold used in the snpgdsLDpruning
function.
Default: sqrt(0.1)
.
a single positive integer
specifying the number of
threads to be used. Default: 1L
.
a logicial
indicating if information is shown
during the process in the snpgdsLDpruning
function. Default: FALSE
.
a character
string representing the chromosome where the
selected SNVs should belong. Only one chromosome can be handled. If
NULL
, the chromosome is not used as a filtering criterion.
Default: NULL
.
a single positive numeric
representing the
minimum allelic frequency used to select the SNVs. If NULL
, the
allelic frequency is not used as a filtering criterion. Default: NULL
.
a logicial
indicating if the information about
the pruned SNVs should be added to the GDS Sample file.
Default: TRUE
.
a character
string representing the directory
where the Profile GDS files will be created. The directory must exist.
a logical
indicating if RDS files containing the
information about the pruned SNVs must be
created. Default: FALSE
.
a character
string representing an existing
directory. The directory must exist. Default: "."
.
a character
string that represents the prefix of the
RDS files that will be generated. The RDS files are only generated when
the parameter keepFile
=TRUE
. Default: "pruned"
.
The function returns 0L
when successful.
## Required library for GDS
library(gdsfmt)
## Path to the demo Reference GDS file is located in this package
dataDir <- system.file("extdata/tests", package="RAIDS")
fileGDS <- file.path(dataDir, "ex1_good_small_1KG.gds")
## The data.frame containing the information about the study
## The 3 mandatory columns: "study.id", "study.desc", "study.platform"
## The entries should be strings, not factors (stringsAsFactors=FALSE)
studyDF <- data.frame(study.id = "MYDATA",
study.desc = "Description",
study.platform = "PLATFORM",
stringsAsFactors = FALSE)
## The data.frame containing the information about the samples
## The entries should be strings, not factors (stringsAsFactors=FALSE)
samplePED <- data.frame(Name.ID = c("ex1", "ex2"),
Case.ID = c("Patient_h11", "Patient_h12"),
Diagnosis = rep("Cancer", 2),
Sample.Type = rep("Primary Tumor", 2),
Source = rep("Databank B", 2), stringsAsFactors = FALSE)
rownames(samplePED) <- samplePED$Name.ID
## Temporary Profile GDS file
profileFile <- file.path(tempdir(), "ex1.gds")
## Copy the Profile GDS file demo that has not been pruned yet
file.copy(file.path(dataDir, "ex1_demo.gds"), profileFile)
#> [1] TRUE
## Open 1KG file
gds1KG <- snpgdsOpen(fileGDS)
## Compute the list of pruned SNVs for a specific profile 'ex1'
## and save it in the Profile GDS file 'ex1.gds'
pruningSample(gdsReference=gds1KG, currentProfile=c("ex1"),
studyID = studyDF$study.id, pathProfileGDS=tempdir())
#> [1] 0
## Close the Reference GDS file (important)
closefn.gds(gds1KG)
## Check content of Profile GDS file
## The 'pruned.study' entry should be present
content <- openfn.gds(profileFile)
content
#> File: /tmp/Rtmps2Gf87/ex1.gds (4.3K)
#> + [ ]
#> |--+ Ref.count { SparseInt16 11000x1, 568B }
#> |--+ Alt.count { SparseInt16 11000x1, 74B }
#> |--+ Total.count { SparseInt16 11000x1, 580B }
#> |--+ study.list [ data.frame ] *
#> | |--+ study.id { Str8 1, 7B }
#> | |--+ study.desc { Str8 1, 12B }
#> | \--+ study.platform { Str8 1, 9B }
#> |--+ study.annot [ data.frame ] *
#> | |--+ data.id { Str8 1, 4B }
#> | |--+ case.id { Str8 1, 12B }
#> | |--+ sample.type { Str8 1, 14B }
#> | |--+ diagnosis { Str8 1, 7B }
#> | |--+ source { Str8 1, 11B }
#> | \--+ study.id { Str8 1, 7B }
#> |--+ geno.ref { Bit2 11000x1 LZMA_ra(10.7%), 301B }
#> \--+ pruned.study { Str8 40, 379B }
## Close the Profile GDS file (important)
closefn.gds(content)
## Remove Profile GDS file (created for demo purpose)
unlink(profileFile, force=TRUE)