R/processStudy.R
createStudy2GDS1KG.Rd
The function uses the information for the Reference GDS file
and the RDS Sample Description file to create the Profile GDS file. One
Profile GDS file is created per profile. One Profile GDS file will be
created for each entry present in the listProfiles
parameter.
a character
string representing the path to the
directory containing the VCF output of SNP-pileup for each sample. The
SNP-pileup files must be compressed (gz files) and have the name identifiers
of the samples. A sample with "Name.ID" identifier would have an
associated file called
if genoSource is "VCF", then "Name.ID.vcf.gz",
if genoSource is "generic", then "Name.ID.generic.txt.gz"
if genoSource is "snp-pileup", then "Name.ID.txt.gz".
a character
string representing the path to the
RDS file that contains the information about the sample to analyse.
The RDS file must
include a data.frame
with those mandatory columns: "Name.ID",
"Case.ID", "Sample.Type", "Diagnosis", "Source". All columns must be in
character
strings. The data.frame
must contain the information for all the samples passed in the
listSamples
parameter. Only filePedRDS
or pedStudy
can be defined.
a data.frame
with those mandatory columns: "Name.ID",
"Case.ID", "Sample.Type", "Diagnosis", "Source". All columns must be in
character
strings (no factor). The data.frame
must contain the information for all the samples passed in the
listSamples
parameter. Only filePedRDS
or pedStudy
can be defined.
a character
string representing the file name of
the Reference GDS file. The file must exist.
a single positive integer
representing the current
identifier for the batch. Beware, this field is not stored anymore.
Default: 1
.
a data.frame
containing the information about the
study associated to the analysed sample(s). The data.frame
must have
those 3 columns: "study.id", "study.desc", "study.platform". All columns
must be in character
strings (no factor).
a vector
of character
string corresponding
to the profile identifiers that will have a Profile GDS file created. The
profile identifiers must be present in the "Name.ID" column of the Profile
RDS file passed to the filePedRDS
parameter.
If NULL
, all profiles present in the filePedRDS
are selected.
Default: NULL
.
a character
string representing the path to
the directory where the Profile GDS files will be created.
Default: NULL
.
a character
string with two possible values:
'snp-pileup', 'generic' or 'VCF'. It specifies if the genotype files
are generated by snp-pileup (Facets) or are a generic format CSV file
with at least those columns:
'Chromosome', 'Position', 'Ref', 'Alt', 'Count', 'File1R' and 'File1A'.
The 'Count' is the depth at the specified position;
'FileR' is the depth of the reference allele and
'File1A' is the depth of the specific alternative allele.
Finally the file can be a VCF file with at least those genotype
fields: GT, AD, DP.
a logical
indicating if message information should be
printed. Default: FALSE
.
The function returns 0L
when successful.
## Path to the demo 1KG GDS file is located in this package
dataDir <- system.file("extdata/tests", package="RAIDS")
fileGDS <- file.path(dataDir, "ex1_good_small_1KG.gds")
## The data.frame containing the information about the study
## The 3 mandatory columns: "study.id", "study.desc", "study.platform"
## The entries should be strings, not factors (stringsAsFactors=FALSE)
studyDF <- data.frame(study.id = "MYDATA",
study.desc = "Description",
study.platform = "PLATFORM",
stringsAsFactors = FALSE)
## The data.frame containing the information about the samples
## The entries should be strings, not factors (stringsAsFactors=FALSE)
samplePED <- data.frame(Name.ID=c("ex1", "ex2"),
Case.ID=c("Patient_h11", "Patient_h12"),
Diagnosis=rep("Cancer", 2),
Sample.Type=rep("Primary Tumor", 2),
Source=rep("Databank B", 2), stringsAsFactors=FALSE)
rownames(samplePED) <- samplePED$Name.ID
## Create the Profile GDS File for samples in 'listSamples' vector
## (in this case, samples "ex1")
## The Profile GDS file is created in the pathProfileGDS directory
result <- createStudy2GDS1KG(pathGeno=dataDir,
pedStudy=samplePED, fileNameGDS=fileGDS,
studyDF=studyDF, listProfiles=c("ex1"),
pathProfileGDS=tempdir(),
genoSource="snp-pileup",
verbose=FALSE)
## The function returns OL when successful
result
#> [1] 0
## The Profile GDS file 'ex1.gds' has been created in the
## specified directory
list.files(tempdir())
#> [1] "downlit" "ex1.gds" "filebb87cbdcaaa"
## Remove Profile GDS file (created for demo purpose)
unlink(file.path(tempdir(), "ex1.gds"), force=TRUE)