Supplementary Material for "Coding Exon-Structure Aware Realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation" Virag Sharma, Anas Elghafari and Michael Hiller Nucleic Acids Res., doi:10.1093/nar/gkw210. 2016 http://nar.oxfordjournals.org/content/early/2016/03/25/nar.gkw210.long ################################################ simulatedAlignments/ Exon and cDNA datasets of simulated alignments that we used to compare CESAR to other spliced aligners. See README in simulatedAlignments/ for more details. ################################################ realignedMaf_100way.maf Realigned 100-way alignment for 188,788 coding exons from 19,865 human genes generated with CESAR in maf format. The alignment also contains 20 bp flanking sequence for each exon to allow inspection of the sequence context of the acceptor site/donor site. ################################################ humanExonsMappedByCESAR/ This directory contains coordinates of the gene models obtained by mapping human genes to 99 non-human vertebrates after realignment with CESAR. The format is UCSC's genePred format (see https://genome.ucsc.edu/FAQ/FAQformat.html#format9) Each file is named as $assembly.gp (where $assembly is the genome assembly) To convert genePred into bed12 format (https://genome.ucsc.edu/FAQ/FAQformat.html), use genePredToBed from the UCSC kent source code genePredToBed humanExonsMappedByCESAR/$assembly.gp humanExonsMappedByCESAR/$assembly.bed To get the coordinates of the individual exons, convert bed12 to bed4 by using bedToExons (kent source code) bedToExons humanExonsMappedByCESAR/$assembly.bed humanExonsMappedByCESAR/$assembly.exons.bed To convert genePred to gtf, use genePredToGtf (kent source code) genePredToGtf file humanExonsMappedByCESAR/$assembly.gp humanExonsMappedByCESAR/$assembly.gtf ################################################ RealSSshifts.txt Real splice site shifts in mouse, rat, cow and dog genome (the details are given in the file header). ################################################ # disk space 317M ./humanExonsMappedByCESAR 30M ./simulatedAlignments/alignments_ExonsOfInterest/intactExons 14M ./simulatedAlignments/alignments_ExonsOfInterest/noFrameShift 14M ./simulatedAlignments/alignments_ExonsOfInterest/oneFrameShift 14M ./simulatedAlignments/alignments_ExonsOfInterest/spliceSiteShifts 15M ./simulatedAlignments/alignments_ExonsOfInterest/twoCompFrameShift 85M ./simulatedAlignments/alignments_ExonsOfInterest 863M ./simulatedAlignments/alignments_entireCDNAs/intactExons 161M ./simulatedAlignments/alignments_entireCDNAs/noFrameShift 160M ./simulatedAlignments/alignments_entireCDNAs/oneFrameShift 160M ./simulatedAlignments/alignments_entireCDNAs/spliceSiteShifts 162M ./simulatedAlignments/alignments_entireCDNAs/twoCompFrameShift 1.5G ./simulatedAlignments/alignments_entireCDNAs 1.6G ./simulatedAlignments 9.5G .