place_mutations()
generates a SNP data.frame by randomly placing a fixed
number of mutations on a given genealogy.
It means that all the sites are perfectly linked with each other.
make_snp_chromosome()
simulates a SNP data.frame on a chromosome using
place_mutations()
with recombination.
make_snp()
calls it for each chromosome in parallel.
Use RNGkind("L'Ecuyer-CMRG")
and set.seed()
to get reproducible results.
The number of CPU cores used can be configured via mc.cores
option.
Usage
make_snp(samples, ss)
make_snp_chromosome(genealogy, segsites)
place_mutations(genealogy, segsites, v_sampled = NULL)
Arguments
- samples
A data.frame:
sample_family
or its tranfromation bygather_segments()
. Using the latter will improve performance when the function is called many times, e.g., inmake_snp()
.- ss
A sequence of
segsites
; its length is the number of chromosome; each element is the number of segsites on a chromosome. If a named vector is given, the output is also named.- genealogy
An output from
make_gene_genealogy()
.- segsites
The number of segregating sites on a segment.
- v_sampled
The sampled vertices. Use this to fix the output order.
Value
make_snp()
returns a list of data.frame for each chromosome.
make_snp_chromosome()
and place_mutations()
returns
a data.frame with segregating sites in columns, and sample segments in rows.
Details
It is assumed that each node in the genealogy has one recombination event
on each chromosome. In other words, a chromosome has a fixed number of
recombination events, equal to the number of nodes in the genealogy.
Their locations are randomly assigned, and segregating sites are placed among them.
For example, a chromosome with 3 segsites on a genealogy with 10 nodes can be
illustrated like this:
5' rrrSrrSrrrrSr 3'
.
See also
write_vcf()
to write the output in VCF format.
Examples
RNGkind("L'Ecuyer-CMRG")
set.seed(666)
result = tekka("-y20 -l2 --sa 2,2 --sj 2,2")
samples = result$sample_family[[1L]]
segments = gather_segments(samples)
genealogy = make_gene_genealogy(segments)
place_mutations(genealogy, 3L) |> str()
#> 'data.frame': 32 obs. of 3 variables:
#> $ : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ : int 0 0 0 0 0 0 0 0 0 0 ...
ss = c(chr1 = 3, chr2 = 2)
make_snp(segments, ss) |> str()
#> List of 2
#> $ chr1:'data.frame': 32 obs. of 3 variables:
#> ..$ : int [1:32] 0 0 0 1 0 0 0 0 0 0 ...
#> ..$ : int [1:32] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ : int [1:32] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chr2:'data.frame': 32 obs. of 2 variables:
#> ..$ : int [1:32] 0 0 1 0 0 0 0 0 0 0 ...
#> ..$ : int [1:32] 0 0 0 0 0 0 0 0 0 0 ...