PostId
Context
At a high-level VariantAlleles.com is being built as a way to learn about modern web appllication development and to provide a useful tool for the genetics community. This document will focus on the genetics aspects of the project.
Decisions
Genetic data at VariantAlleles.com will be derived from several sources that will be documented.
Reference Genome
VariantAlleles.com will use the GRCh38 reference genome. There has been a lot of focus recently on usage of GRCh38 as the reference genome. This is in contrast to when I started my most recent position in 2017 where GRCh37 was the standard.
Genes
VariantAlleles.com will use the ACMG secondary findings genes as the default gene list. As of Fall 2023 there are 73 unique genes on the list.
The ACMG secondary findings gene list was downloaded from the supplementary material of the following paper:
Transcripts
VariantAlleles.com will use the MANE transcripts as the default transcript list. I used the UCSC table browser to download all of the MANE transcripts for GRCh38. I used R to wrangle and filter the data to only include transcripts that are in the ACMG secondary findings genes. There were 84 unique transcripts in the final list which is more than the 73 genes. This is because some genes have multiple transcripts and in the case of MANE transcripts there might be multiple transcripts for a gene so that all likely pathogenic and pathogenic variants are covered. These are called Plus Clinical Mane transcripts. If a gene has a Plus Clinical Mane transcript then that transcript is used as the default transcript for the gene otherwise the primary MANE transcript is used.
Variants
VariantAlleles.com will use ClinVar as the default source for variants. I used the ClinVar FTP site to download a summary-text file of all variants in ClinVar. I used R to wrangle and filter the data to only include variants that are in the ACMG secondary findings genes and have a MANE transcript.