Description
CESAR utilizes existing whole genome alignments to detect conserved coding exons and then maps gene annotations from one (reference) genome to many aligned (query) genomes. Since genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved, CESAR aligns the exon again ("realign"), considering the reading frame and splice site position of the exon. The resulting alignment will preserve the reading frame and splice sites if the query sequence contains an intact exon.
CESAR detects 91% of shifted splice sites and aligns the shifted splice site to the reference splice site. Such exon mappings are very specific as 99% of the human exons that lack inactivating mutations in mouse after realigning match annotated mouse exons.
Methods
CESAR was applied to the 144-way alignment that aligns 143 vertebrates to the human hg38 genome. This
alignment has increased specificity and sensitivity. All 195,279 coding exons in 19,846 UCSC knownGenes (longest isoform) were realigned with CESAR. All intact exons from the same gene were grouped into a gene model.
The coordinates of the intact exons and annotated genes (genePred format) for the 143 vertebrates are available for download here.
Credits
This track was produced by Virag Sharma and the Hiller Lab at the Max Planck Institute of Molecular Cell Biology and Genetics. For questions regarding this hub, please contact Michael Hiller.
References
Virag Sharma and Michael Hiller. Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Res, 45(14) 8369-8377, 2017
Virag Sharma, Anas Elghafari, and Michael Hiller. Coding Exon-Structure Aware Realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation. Nucleic Acids Research, 44(11):e103, 2016