Archaic Sequence Hub
What is Archaic Introgression
Adaptive introgression refers to the transfer of genetic variants from one population or species to another, conferring an evolutionary advantage upon the recipient population or species, and may lead to a faster rate of adaptation than is predicted from models with mutation and selection alone. When adaptive introgression occurs from archaic humans to modern humans, it is specifically termed as Archaic Introgression.
The sequencing of archaic human genomes coupled with the availability of genome sequences from diverse present-day human populations have revealed multiple episodes of gene flow between archaic and modern humans. In particular, non-African populations have 1–2% Neanderthal ancestry, and Melanesians and East Asians have 3% and 0.2% ancestry, respectively, from Denisovans. Beyond genome-wide proportions, a number of studies have attempted to characterize how introgressed archaic DNA is distributed along the genome. Analysis of maps of archaic introgression have yielded novel insights into human evolution and biology.
The sequencing of archaic human genomes coupled with the availability of genome sequences from diverse present-day human populations have revealed multiple episodes of gene flow between archaic and modern humans. In particular, non-African populations have 1–2% Neanderthal ancestry, and Melanesians and East Asians have 3% and 0.2% ancestry, respectively, from Denisovans. Beyond genome-wide proportions, a number of studies have attempted to characterize how introgressed archaic DNA is distributed along the genome. Analysis of maps of archaic introgression have yielded novel insights into human evolution and biology.
What We Do in This Project
This study aims to investigate the impact of utilizing the complete reference genome T2T-CHM13 on the detection of archaic introgression in modern populations. By identifying and comparing the magnitude of archaic introgression detected based on GRCh37, GRCh38 and T2T-CHM13, we aim to elucidate whether the application of T2T-CHM13 would give corrections on the previously estimated signals of archaic introgression.
Given the limitations of the previous studies, we currently use IBDmix to detect the archaic introgressed sequences in 2,504 samples from 26 geographically diverse populations in 1,000 Genomes Project, based on the three different human reference genomes. Also, given the absence of publicly available databases specifically focused on the archaic introgression in modern humans, we have been undertaking this project on the integration of the archaic introgressed call sets and its corresponding functional impacts across populations, launching the information into this database. We aim to create a user-friendly, easily-accessible visualized platform that enables the researchers and/or enthusiasts to browse and explore the information about the archaic introgression along the genome according to their specific interests and distinct usage of the reference genomes.
Given the limitations of the previous studies, we currently use IBDmix to detect the archaic introgressed sequences in 2,504 samples from 26 geographically diverse populations in 1,000 Genomes Project, based on the three different human reference genomes. Also, given the absence of publicly available databases specifically focused on the archaic introgression in modern humans, we have been undertaking this project on the integration of the archaic introgressed call sets and its corresponding functional impacts across populations, launching the information into this database. We aim to create a user-friendly, easily-accessible visualized platform that enables the researchers and/or enthusiasts to browse and explore the information about the archaic introgression along the genome according to their specific interests and distinct usage of the reference genomes.
Why We Need a Complete Genome Assembly
in Archaic Introgression Analysis
To date, the study of archaic introgression in modern populations has been primarily relying on the original reference genome GRCh37. However, the variable quality and level of the completeness of the subsequent reference genomes (GRCh38 and T2T-CHM13) may lead to upgradation in the detection of the introgressed signals across the genome.
Leveraging the long-read sequencing technologies, a complete human reference genome, T2T-CHM13, which corrected errors in the prior references and addressed the remaining 8% of the genome, has emerged. The emergence of the T2T-CHM13 holds great promise in enhancing our understanding of human genomics, including structures such as segmental duplications, repeats, large-scale genomic differences, and centromeres, which have shown the influences on the epigenomics and population genetics studies. Nevertheless, the impact of the T2T-CHM13 on introgression patterns in human populations remains to be elucidated. In light of this, the complete and high-quality human reference genome, T2T-CHM13, provides a unique opportunity to reevaluate and advance our understanding of archaic genetic legacy left into modern populations.
Leveraging the long-read sequencing technologies, a complete human reference genome, T2T-CHM13, which corrected errors in the prior references and addressed the remaining 8% of the genome, has emerged. The emergence of the T2T-CHM13 holds great promise in enhancing our understanding of human genomics, including structures such as segmental duplications, repeats, large-scale genomic differences, and centromeres, which have shown the influences on the epigenomics and population genetics studies. Nevertheless, the impact of the T2T-CHM13 on introgression patterns in human populations remains to be elucidated. In light of this, the complete and high-quality human reference genome, T2T-CHM13, provides a unique opportunity to reevaluate and advance our understanding of archaic genetic legacy left into modern populations.
Therefore, we remapped the archaic sequencing reads onto GRCh37, GRCh38 and T2T-CHM13 separately, and identified that T2T-CHM13 indeed exhibits superior mapping quality. Subsequently, we conducted a reanalysis of archaic introgression utilizing this refined dataset, providing evidence that compared with the other two reference genomes, there are more Neanderthal sequences detected in T2T-CHM13 reference genome. Certainly, a more complete reference genome-T2T-CHM13 unquestionably facilitates a more precise examination within the context of human evolution.
How we did this
Step1
Digging
Phasing
Step2
Mapping
SNP Calling
Step3
IBDmix Calling
Function analyzing
What we are working on
Finished
Neanderthal