Highlights in dbNSFP v5.1
March 21, 2025
Jin Yu & Xiaoming Liu
March 21, 2025
Jin Yu & Xiaoming Liu
In a collaboration with HGSC at Baylor College of Medicine, dbNSFP v5.1a now includes variant allele frequencies among estimated ~1.5 million generally healthy individuals, aggregated from the largest population sequencing projects including gnomAD (v4.1, 807K total, including ~416K exomes from UK Biobank), TOPMed (Freeze 8, 132K genomes), All of Us (v7.1, 250K genomes) and Regeneron (1M exomes, including UK Biobank).
We believe this resource will not only help clinical genomic variant interpretations, but also help basic research for complex diseases such as schizophrenia and coronary artery disease, using the ultra-rare variants strategies with this much expanded reference population.
We also introduced a new column: "dbNSFP_POPMAX_AF" defined by the maximum allele frequency (AF) among all AF columns curated in the current version of dbNSFP. Utilizing this column, we can directly address interesting questions such as: "Among all possible non-synonymous SNVs of known protein-coding genes in the human genome defined by dbNSFP, how many of them have already been observed in the 1.5 million individuals we have sequenced so far?"
Answers are shown in the following figure:
We have rebuilt dbNSFP v5.1 variant set for all potential non-synonymous SNVs and splicing (acceptor and donor) sites SNVs in dbNSFP v5.1 using the latest gene definitions in GENCODE release 47, as an update from release 46 used in dbNSFP v5.0. The updated GENCODE not only refined transcripts of known genes but also added novel protein-coding genes for non-synonymous and splicing-site SNVs determinations.
Following are an overview of variant changes by chromosome in dbNSFP v5.1 vs. v5.0 because of the GENCODE update, and a zoomed-in view for the largest variant number changes on chromosome 5:
Questions, ideas and suggestions are welcome at feedback@dbnsfp.org