R/synthetic.R
computeSyntheticROC.RdThe function calculates the AUROC of the inferences for specific values of D and K using the inferred ancestry results from the synthetic profiles. The calculations are done on each super-population separately as well as on all the results together.
computeSyntheticROC(
matKNN,
matKNNAncestryColumn,
pedCall,
pedCallAncestryColumn,
listCall = c("EAS", "EUR", "AFR", "AMR", "SAS")
)a data.frame containing the inferred ancestry results
for fixed values of D and K. On of the column names of the
data.frame must correspond to the matKNNAncestryColumn
argument.
a character string
representing the
name of the column that contains the inferred ancestry for the specified
synthetic profiles. The column must be present in the matKNN
argument.
a data.frame containing the information about
the super-population information from the 1KG GDS file
for profiles used to generate the synthetic profiles. The data.frame
must contained a column named as the pedCallAncestryColumn argument.
The row names must correspond to the sample identifiers (mandatory).
a character string representing the
name of the column that contains the known ancestry for the reference
profiles in the Reference GDS file. The column must be present in
the pedCall argument.
a vector of character strings representing
the list of all possible ancestry assignations.
Default: c("EAS", "EUR", "AFR", "AMR", "SAS").
list containing 3 entries:
matAUROC.Alla data.frame containing the AUROC for all
the ancestry results.
matAUROC.Calla data.frame containing the AUROC
information for each super-population.
listROC.Calla list containing the output from the
roc function for each super-population.
## Loading demo dataset containing pedigree information for synthetic
## profiles and known ancestry of the profiles used to generate the
## synthetic profiles
data(pedSynthetic)
## Loading demo dataset containing the inferred ancestry results
## for the synthetic data
data(matKNNSynthetic)
## The inferred ancestry results for the synthetic data using
## values of D=6 and K=5
matKNN <- matKNNSynthetic[matKNNSynthetic$K == 6 & matKNNSynthetic$D == 5, ]
## Compile statistics from the
## synthetic profiles for fixed values of D and K
results <- RAIDS:::computeSyntheticROC(matKNN=matKNN,
matKNNAncestryColumn="SuperPop",
pedCall=pedSynthetic, pedCallAncestryColumn="superPop",
listCall=c("EAS", "EUR", "AFR", "AMR", "SAS"))
results$matAUROC.All
#> pcaD K ROC.AUC ROC.CI N NBNA
#> 1 5 6 0.6883929 0 52 0
results$matAUROC.Call
#> pcaD K Call L AUC H
#> 1 5 6 EAS 0.5197913 0.6904762 0.8611611
#> 2 5 6 EUR 0.4807257 0.6547619 0.8287981
#> 3 5 6 AFR 0.8168697 0.9154135 1.0000000
#> 4 5 6 AMR 0.4009287 0.5681818 0.7354350
#> 5 5 6 SAS 0.4729463 0.6404762 0.8080061
results$listROC.Call
#> $EAS
#>
#> Call:
#> roc.formula(formula = fCur ~ predMat[, j], ci = TRUE, quiet = TRUE)
#>
#> Data: predMat[, j] in 42 controls (fCur 0) < 10 cases (fCur 1).
#> Area under the curve: 0.6905
#> 95% CI: 0.5198-0.8612 (DeLong)
#>
#> $EUR
#>
#> Call:
#> roc.formula(formula = fCur ~ predMat[, j], ci = TRUE, quiet = TRUE)
#>
#> Data: predMat[, j] in 42 controls (fCur 0) < 10 cases (fCur 1).
#> Area under the curve: 0.6548
#> 95% CI: 0.4807-0.8288 (DeLong)
#>
#> $AFR
#>
#> Call:
#> roc.formula(formula = fCur ~ predMat[, j], ci = TRUE, quiet = TRUE)
#>
#> Data: predMat[, j] in 38 controls (fCur 0) < 14 cases (fCur 1).
#> Area under the curve: 0.9154
#> 95% CI: 0.8169-1 (DeLong)
#>
#> $AMR
#>
#> Call:
#> roc.formula(formula = fCur ~ predMat[, j], ci = TRUE, quiet = TRUE)
#>
#> Data: predMat[, j] in 44 controls (fCur 0) < 8 cases (fCur 1).
#> Area under the curve: 0.5682
#> 95% CI: 0.4009-0.7354 (DeLong)
#>
#> $SAS
#>
#> Call:
#> roc.formula(formula = fCur ~ predMat[, j], ci = TRUE, quiet = TRUE)
#>
#> Data: predMat[, j] in 42 controls (fCur 0) < 10 cases (fCur 1).
#> Area under the curve: 0.6405
#> 95% CI: 0.4729-0.808 (DeLong)
#>