Skip to contents

place_mutations() generates a SNP data.frame by randomly placing a fixed number of mutations on a given genealogy. It means that all the sites are perfectly linked with each other.

make_snp_chromosome() simulates a SNP data.frame on a chromosome using place_mutations() with recombination. make_snp() calls it for each chromosome in parallel.

Use RNGkind("L'Ecuyer-CMRG") and set.seed() to get reproducible results. The number of CPU cores used can be configured via mc.cores option.

Usage

make_snp(samples, ss)

make_snp_chromosome(genealogy, segsites)

place_mutations(genealogy, segsites, v_sampled = NULL)

Arguments

samples

A data.frame: sample_family or its tranfromation by gather_segments(). Using the latter will improve performance when the function is called many times, e.g., in make_snp().

ss

A sequence of segsites; its length is the number of chromosome; each element is the number of segsites on a chromosome. If a named vector is given, the output is also named.

genealogy

An output from make_gene_genealogy().

segsites

The number of segregating sites on a segment.

v_sampled

The sampled vertices. Use this to fix the output order.

Value

make_snp() returns a list of data.frame for each chromosome.

make_snp_chromosome() and place_mutations() returns a data.frame with segregating sites in columns, and sample segments in rows.

Details

It is assumed that each node in the genealogy has one recombination event on each chromosome. In other words, a chromosome has a fixed number of recombination events, equal to the number of nodes in the genealogy. Their locations are randomly assigned, and segregating sites are placed among them. For example, a chromosome with 3 segsites on a genealogy with 10 nodes can be illustrated like this: 5' rrrSrrSrrrrSr 3'.

See also

write_vcf() to write the output in VCF format.

Examples

RNGkind("L'Ecuyer-CMRG")
set.seed(666)
result = tekka("-y20 -l2 --sa 2,2 --sj 2,2")
samples = result$sample_family[[1L]]
segments = gather_segments(samples)
genealogy = make_gene_genealogy(segments)

place_mutations(genealogy, 3L) |> str()
#> 'data.frame':	32 obs. of  3 variables:
#>  $ : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ : int  0 0 0 0 0 0 0 0 0 0 ...

ss = c(chr1 = 3, chr2 = 2)
make_snp(segments, ss) |> str()
#> List of 2
#>  $ chr1:'data.frame':	32 obs. of  3 variables:
#>   ..$ : int [1:32] 0 0 0 1 0 0 0 0 0 0 ...
#>   ..$ : int [1:32] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ : int [1:32] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ chr2:'data.frame':	32 obs. of  2 variables:
#>   ..$ : int [1:32] 0 0 1 0 0 0 0 0 0 0 ...
#>   ..$ : int [1:32] 0 0 0 0 0 0 0 0 0 0 ...