MCMC sampling of gene genealogies conditional on genetic marker data
The variation observed today in the human genome is a result of genetic processes, such as mutation, acting over time on the DNA of our ancestors. The 'gene genealogy' for a sample of genes from unrelated individuals is a tree describing the historical ancestral genetic events and relationships giving rise to this variation. However, since the time scale is on the order of tens of thousands of years, the true gene genealogy can not be known. I am interested in using information from genetic markers - variable DNA sites - to infer genealogical relationships between unrelated individuals.
My research, which draws on statistics and population genetics, involves the development and dissemination of statistical approaches incorporating these ancestral trees. I have recently implemented an approach to probabilistically sample genealogical trees that are likely to have given rise to a sample of genetic data (sampletrees). Current work on the sampler involves: development of an R package; computational improvements to the sampler in order to improve MCMC mixing; and handling missing or partially known genetic data.
I am also working on the application of ancestral tree sampling to statistical methodology for finding disease-predisposing genetic variants. In regions of the genome that harbour variants that influence human traits/disease, we expect that individuals who share a similar trait will be more closely related genetically near the trait-influencing variant. The genetic similarity between two individuals is reflected by proximity in the ancestral tree. I am developing and evaluating statistics that relate similarity in disease state to genetic relatedness, as measured by the tree. Current work involves evaluating whether a tree-based approach can be used to find rare variants.