This function computes the list of pruned SNVs for a specific profile. When a group of SNVs are in linkage disequilibrium, only one SNV from that group is retained. The linkage disequilibrium is calculated with the snpgdsLDpruning() function. The initial list of SNVs that are passed to the snpgdsLDpruning() function can be specified by the user.

pruningSample(
  gdsReference,
  method = c("corr", "r", "dprime", "composite"),
  currentProfile,
  studyID,
  listSNP = NULL,
  slideWindowMaxBP = 500000L,
  thresholdLD = sqrt(0.1),
  np = 1L,
  verbose = FALSE,
  chr = NULL,
  superPopMinAF = NULL,
  keepPrunedGDS = TRUE,
  pathProfileGDS = NULL,
  keepFile = FALSE,
  pathPrunedGDS = ".",
  outPrefix = "pruned"
)

Arguments

gdsReference

an object of class gds.class (a GDS file), the 1 KG GDS file (reference data set).

method

a character string that represents the method that will be used to calculate the linkage disequilibrium in the snpgdsLDpruning() function. The 4 possible values are: "corr", "r", "dprime" and "composite". Default: "corr".

currentProfile

a character string corresponding to the profile identifier used in LD pruning done by the snpgdsLDpruning() function. A Profile GDS file corresponding to the profile identifier must exist and be located in the pathProfileGDS directory.

studyID

a character string corresponding to the study identifier used in the snpgdsLDpruning function. The study identifier must be present in the Profile GDS file.

listSNP

a vector of SNVs identifiers specifying selected to be passed the the pruning function; if NULL, all SNVs are used in the snpgdsLDpruning function. Default: NULL.

slideWindowMaxBP

a single positive integer that represents the maximum basepairs (bp) in the sliding window. This parameter is used for the LD pruning done in the snpgdsLDpruning function. Default: 500000L.

thresholdLD

a single numeric value that represents the LD threshold used in the snpgdsLDpruning function. Default: sqrt(0.1).

np

a single positive integer specifying the number of threads to be used. Default: 1L.

verbose

a logicial indicating if information is shown during the process in the snpgdsLDpruning function. Default: FALSE.

chr

a character string representing the chromosome where the selected SNVs should belong. Only one chromosome can be handled. If NULL, the chromosome is not used as a filtering criterion. Default: NULL.

superPopMinAF

a single positive numeric representing the minimum allelic frequency used to select the SNVs. If NULL, the allelic frequency is not used as a filtering criterion. Default: NULL.

keepPrunedGDS

a logicial indicating if the information about the pruned SNVs should be added to the GDS Sample file. Default: TRUE.

pathProfileGDS

a character string representing the directory where the Profile GDS files will be created. The directory must exist.

keepFile

a logical indicating if RDS files containing the information about the pruned SNVs must be created. Default: FALSE.

pathPrunedGDS

a character string representing an existing directory. The directory must exist. Default: ".".

outPrefix

a character string that represents the prefix of the RDS files that will be generated. The RDS files are only generated when the parameter keepFile=TRUE. Default: "pruned".

Value

The function returns 0L when successful.

Author

Pascal Belleau, Astrid Deschênes and Alexander Krasnitz

Examples


## Required library for GDS
library(gdsfmt)

## Path to the demo Reference GDS file is located in this package
dataDir <- system.file("extdata/tests", package="RAIDS")
fileGDS <- file.path(dataDir, "ex1_good_small_1KG.gds")

## The data.frame containing the information about the study
## The 3 mandatory columns: "study.id", "study.desc", "study.platform"
## The entries should be strings, not factors (stringsAsFactors=FALSE)
studyDF <- data.frame(study.id = "MYDATA",
                        study.desc = "Description",
                        study.platform = "PLATFORM",
                        stringsAsFactors = FALSE)

## The data.frame containing the information about the samples
## The entries should be strings, not factors (stringsAsFactors=FALSE)
samplePED <- data.frame(Name.ID = c("ex1", "ex2"),
                    Case.ID = c("Patient_h11", "Patient_h12"),
                    Diagnosis = rep("Cancer", 2),
                    Sample.Type = rep("Primary Tumor", 2),
                    Source = rep("Databank B", 2), stringsAsFactors = FALSE)
rownames(samplePED) <- samplePED$Name.ID

## Temporary Profile GDS file
profileFile <- file.path(tempdir(), "ex1.gds")

## Copy the Profile GDS file demo that has not been pruned yet
file.copy(file.path(dataDir, "ex1_demo.gds"), profileFile)
#> [1] TRUE

## Open 1KG file
gds1KG <- snpgdsOpen(fileGDS)

## Compute the list of pruned SNVs for a specific profile 'ex1'
## and save it in the Profile GDS file 'ex1.gds'
pruningSample(gdsReference=gds1KG, currentProfile=c("ex1"),
              studyID = studyDF$study.id, pathProfileGDS=tempdir())
#> [1] 0

## Close the Reference GDS file (important)
closefn.gds(gds1KG)

## Check content of Profile GDS file
## The 'pruned.study' entry should be present
content <- openfn.gds(profileFile)
content
#> File: /tmp/Rtmps2Gf87/ex1.gds (4.3K)
#> +    [  ]
#> |--+ Ref.count   { SparseInt16 11000x1, 568B }
#> |--+ Alt.count   { SparseInt16 11000x1, 74B }
#> |--+ Total.count   { SparseInt16 11000x1, 580B }
#> |--+ study.list   [ data.frame ] *
#> |  |--+ study.id   { Str8 1, 7B }
#> |  |--+ study.desc   { Str8 1, 12B }
#> |  \--+ study.platform   { Str8 1, 9B }
#> |--+ study.annot   [ data.frame ] *
#> |  |--+ data.id   { Str8 1, 4B }
#> |  |--+ case.id   { Str8 1, 12B }
#> |  |--+ sample.type   { Str8 1, 14B }
#> |  |--+ diagnosis   { Str8 1, 7B }
#> |  |--+ source   { Str8 1, 11B }
#> |  \--+ study.id   { Str8 1, 7B }
#> |--+ geno.ref   { Bit2 11000x1 LZMA_ra(10.7%), 301B }
#> \--+ pruned.study   { Str8 40, 379B }

## Close the Profile GDS file (important)
closefn.gds(content)

## Remove Profile GDS file (created for demo purpose)
unlink(profileFile, force=TRUE)