Statistical Genetics and Pharmacogenomics Group

The statistical genetics and pharmacogenomics research group focus on developing and applying methodology for the analysis of genetic data across a range of study designs, with the ultimate aims of improving our understanding of susceptibility to complex diseases as well as their diagnosis, prognosis and prevention, as well as identifying predictors of treatment response.

Our research falls into two broad categories: i) the development and evaluation of novel methods for the analysis of genetic data; and ii) the application of robust and appropriate methods to analyse genetic datasets. Both are focussed on the common aims of improving our understanding of: i) susceptibility to diseases; ii) diagnosis, prognosis and prevention of disease; and iii) predictors of treatment response (“pharmacogenetics”). 

The methods that form the basis of our research include statistical methods, which encompass both traditional statistical methods and those perceived as more modern statistical methods such as machine learning, as well as methodologies aimed at ensuring the robust design and conduct of genetic studies. Our application is across a broad and diverse range of diseases, including complex and infectious disease. We also develop software to accompany our methods to ensure they are easily accessible.

Examples of some of our current research interests are highlighted below: 

i) Analysing genetic data with survival outcomes

In genetic association studies, particularly pharmacogenetic studies, the phenotype of interest is often best captured as a time to event outcome. Methods and software for analysing GWAS and sequencing studies are predominantly aimed at binary and quantitative outcomes, and an analysis bottleneck exists for survival analysis within a GWAS and sequencing context. Our group has developed methods and software for the analysis of GWAS with time to event outcomes, both for common and rare variants (https://www.liverpool.ac.uk/translational-medicine/research/statistical-genetics/survival-gwas/https://www.liverpool.ac.uk/translational-medicine/research/statistical-genetics/survival-gwas-sv/), and similar developments are currently underway for use with exome sequencing data, within our recently MRC Methodology Research Panel funded project.

ii) Biomarker-guided randomised controlled trials

The ultimate aim of pharmacogenetics is to identify genetic markers with a view to maximising the benefit-risk ratio of treatments. However, testing the effectiveness of a pharmacogenetic-guided approach to treatment in improving patient health yields challenges both in terms of trial design and analysis. Although a variety of biomarker-guided trial designs have been proposed in the literature, navigating the literature to understand their statistical validity, application and interpretation can be difficult. Further, deciding on which trial design is appropriate for testing a given hypothesis is often challenging.  To address this, we have developed an user-friendly online tool, BiGTeD, informed by a comprehensive literature review, which provides researchers with clarity in definition, methodology and terminology of the various biomarker-guided trial designs. The tool can also help investigators embarking on such trials to choose the most appropriate design for a given purpose.

iii) Immune repertoire sequencing

Each adaptive immune cell has a unique receptor located on its surface, which recognises an immunological target. There is a short genetic region, unique to each adaptive immune cell, which determines the form of this receptor; next generation sequencing can be used to capture this region in a process known as immune repertoire sequencing. This generates a snapshot of the adaptive immune system at the resolution of individual cells, and these datasets have huge potential in improving our understanding of the immune system. Work within our group is focusing on developing novel analysis methods for these datasets which will help us to understand the underlying immune mechanisms and allow us to use these datasets for diagnosis.

iv) Cancer genetics

Cancers arise as a result of genetic changes in the DNA of tumour cells. Identifying these changes can improve diagnosis, inform treatment options, and understand the underlying mechanisms of disease. One of our interests is in detecting germline mutations which are known to affect cancer predisposition; this is used clinically to inform treatment options and prognosis, as well as risk to family members. We are also interested in the use of circulating tumour DNA (ctDNA) for disease diagnosis and monitoring. ctDNA is DNA from the tumour which is circulating freely, at potentially very low levels, in the blood stream. By developing methods of variant calling designed for these low frequency mutations, we aim to improve the clinical utility of ctDNA.

v) Machine learning to understand differences in the genome composition of microbes

Work within our group is focussed on applying  machine learning to understanding differences in genome composition across a wide range of microbes, including human pathogens (including SARS-CoV-2/COVID-19), and whether genomic composition biases are associated with microbial traits.

vi) Systematic reviews and meta-analyses of pharmacogenetic studies

In genetic association studies, effect sizes are often modest, or if larger effect sizes are observed these are often for rarer variants. Individual studies are therefore often combined within a systematic review and meta-analysis to increase statistical power. Our group has expertise and extensive experience in conducting large systematic reviews of pharmacogenetic studies, and in applying advanced statistical methods to ensure robust meta-analysis of study-level effect estimates (see Publications section). We have also developed a tool for assessing the methodological quality of pharmacogenetic studies which should be an integral part of the systematic review process.

vii) Reporting guidelines for pharmacogenetic studies

To facilitate the evidence synthesis process for pharmacogenetic studies, it is essential that primary studies report their methods and findings clearly and transparently. Work undertaken within our group identified that essential items to allow inclusion of a study within a meta-analysis and to allow its quality to be assessed during systematic review were often not reported within the study publications. To address this concern, we developed the STROPS (Strengthening the Reporting Of Pharmacogenetic Studies) guidelines and work is ongoing to encourage endorsement and usage of the guidelines by journals and other stakeholders.

viii) Analysis of epistasis in genetic data

This is an area which aims to study the joint effects of two loci on a phenotype of interest, building on the hypothesis that there are indeed  combinations of polymorphisms that exhibit a joint effect on a phenotype without a (demonstrable) marginal effect. This analysis has several challenges, including a combinatorial issue as with a million polymorphisms studied, the combinations of two (or in extension more) polymorphisms becomes exceptionally large, necessitating algorithmic adaptations, such as General-purpose computing on graphics processing units (PGGPU).

ix) Analysis of multi-omics datasets

Here we follow a route that is routed in multi-view canonical correlation analysis, especially in non-linear variants of that methodology, that also employ regularization techniques to obtain solutions to the problems asked that contain potentially very few variables, i.e. create sparsity. We are currently employing this on genome-wide data from the UK Biobank. Interestingly these methods appear to be working very well replacing at least some deterministic components with their randomized counterparts, allowing for comparatively rapid analysis of large datasets.  

Back to: Institute of Population Health