R/processStudy.R
    computeKNNRefSample.RdThe function runs k-nearest neighbors analysis on a one specific profile. The function uses the 'knn' package.
a list with 3 entries:
'sample.id', 'eigenvector.ref' and 'eigenvector'. The list represents
the PCA done on the 1KG reference profiles and one specific profile
projected onto it. The 'sample.id' entry must contain only one identifier
(one profile).
a vector of character string
representing the list of possible ancestry assignations. Default:
c("EAS", "EUR", "AFR", "AMR", "SAS").
vector of character strings representing the
known super population ancestry for the 1KG profiles. The 1KG profile
identifiers are used as names for the vector.
a character string representing the name of
the column that will contain the inferred ancestry for the specified
profile. Default: "SuperPop".
a vector of integer representing  the list of
values tested for the K parameter. The K parameter represents the
number of neighbors used in the K-nearest neighbor analysis. If NULL,
the value seq(2,15,1) is assigned.
Default: seq(2,15,1).
a vector of integer representing  the list of
values tested for the D parameter. The D parameter represents the
number of dimensions used in the PCA analysis.  If NULL,
the value seq(2, 15, 1) is assigned.
Default: seq(2, 15, 1).
a list containing 4 entries:
sample.ida vector of character strings
representing the identifier of the profile analysed.
matKNNa data.frame containing the super population
inference for the profile for different values of PCA
dimensions D and k-neighbors values K. The fourth column title
corresponds to the fieldPopInfAnc parameter.
The data.frame contains 4 columns:
sample.ida character string representing
the identifier of the profile analysed.
Da numeric strings representing
the value of the PCA dimension used to infer the ancestry.
Ka numeric strings representing
the value of the k-neighbors used to infer the ancestry..
fieldPopInfAnca character string representing
the inferred ancestry.
## Load the demo PCA on the synthetic profiles projected on the
## demo 1KG reference PCA
data(demoPCASyntheticProfiles)
## Load the known ancestry for the demo 1KG reference profiles
data(demoKnownSuperPop1KG)
## The PCA with 1 profile projected on the 1KG reference PCA
## Only one profile is retained
pca <- demoPCASyntheticProfiles
pca$sample.id <- pca$sample.id[1]
pca$eigenvector <- pca$eigenvector[1, , drop=FALSE]
## Projects profile on 1KG PCA
results <- computeKNNRefSample(listEigenvector=pca,
    listCatPop=c("EAS", "EUR", "AFR", "AMR", "SAS"),
    spRef=demoKnownSuperPop1KG, fieldPopInfAnc="SuperPop",
    kList=seq(10, 15, 1), pcaList=seq(10, 15, 1))
## The assigned ancestry to the profile for different values of K and D
head(results$matKNN)
#>         sample.id  D  K SuperPop
#> 1 1.ex1.HG00246.1 10 10      SAS
#> 2 1.ex1.HG00246.1 10 11      SAS
#> 3 1.ex1.HG00246.1 10 12      SAS
#> 4 1.ex1.HG00246.1 10 13      SAS
#> 5 1.ex1.HG00246.1 10 14      SAS
#> 6 1.ex1.HG00246.1 10 15      EAS