Skip to content. | Skip to navigation

Personal tools
You are here: Home / Publications / A Comprehensive SNP and Indel Imputability Database

A Comprehensive SNP and Indel Imputability Database

Duan, Qing; Liu, Eric Yi; Croteau-Chonka, Damien C.; Mohlke, Karen L.; & Li, Yun. (2013). A Comprehensive SNP and Indel Imputability Database. Bioinformatics, 29(4), 528-31.

Duan, Qing; Liu, Eric Yi; Croteau-Chonka, Damien C.; Mohlke, Karen L.; & Li, Yun. (2013). A Comprehensive SNP and Indel Imputability Database. Bioinformatics, 29(4), 528-31.

Octet Stream icon 190.ris — Octet Stream, 2 kB (2,353 bytes)

Motivation: Genotype imputation has become an indispensible step in genome-wide association studies (GWAS). Imputation accuracy, directly influencing downstream analysis, has shown to be improved using re-sequencing-based reference panels; however, this comes at the cost of high computational burden due to the huge number of potentially imputable markers (tens of millions) discovered through sequencing a large number of individuals. Therefore, there is an increasing need for access to imputation quality information without actually conducting imputation. To facilitate this process, we have established a publicly available SNP and indel imputability database, aiming to provide direct access to imputation accuracy information for markers identified by the 1000 Genomes Project across four major populations and covering multiple GWAS genotyping platforms. Results: SNP and indel imputability information can be retrieved through a user-friendly interface by providing the ID(s) of the desired variant(s) or by specifying the desired genomic region. The query results can be refined by selecting relevant GWAS genotyping platform(s). This is the first database providing variant imputability information specific to each continental group and to each genotyping platform. In Filipino individuals from the Cebu Longitudinal Health and Nutrition Survey, our database can achieve an area under the receiver-operating characteristic curve of 0.97, 0.91, 0.88 and 0.79 for markers with minor allele frequency >5%, 3–5%, 1–3% and 0.5–1%, respectively. Specifically, by filtering out 48.6% of markers (corresponding to a reduction of up to 48.6% in computational costs for actual imputation) based on the imputability information in our database, we can remove 77%, 58%, 51% and 42% of the poorly imputed markers at the cost of only 0.3%, 0.8%, 1.5% and 4.6% of the well-imputed markers with minor allele frequency >5%, 3–5%, 1–3% and 0.5–1%, respectively.




JOUR



Duan, Qing
Liu, Eric Yi
Croteau-Chonka, Damien C.
Mohlke, Karen L.
Li, Yun



2013


Bioinformatics

29

4

528-31


January 3, 2013





10.1093/bioinformatics/bts724



190