Exon and cDNA datasets of simulated alignments. 

The two directories contain the following content: 

alignments_ExonsOfInterest:  Only the alignments of a single exon that we used to evaluate the different alignment methods.
                             These alignments are useful for inspecting where the methods report an alignment that differs from the true alignment. 
alignments_entireCDNAs:      Full alignments of the cDNA to the part of the query genome fully containing this gene. 
                             These datasets can be used to benchmark new methods.


Each directory contains subdirectories with the different datasets:
1) intactExons - The exon has an intact reading frame and identical splice sites
2) noFrameShift - The exon contains two spurious frameshifts which are 6 to 12 bp apart. The correct alignment should not have any frameshift.
3) oneFrameShift - The exon has exactly one real frameshift. 
4) twoCompFrameShifts - The exon has two real compensating frameshifts which are 30 to 45 bp apart. The correct alignment should have two frameshifts 
5) spliceSiteShifts - One splice site (either acceptor or donor site) is shifted in each exon. The shift could indicate a shortening of exon or lengthening of exon.


Each of these subdirectories contains 10 files referring to the different evolutionary distances, 0.1 to 1.0 substitutions per neutral site. 

Each file contains the results for these alignment methods:
1) Spaln
2) Exonerate
3) Pairagon
4) Genewise
5) CESAR
For Spaln and Exonerate, aa2Genome or protein2genome refers to the mode when the protein sequence was used as input, while nt2Genome or coding2genome refers to the mode when the coding sequence was used as input.

Each alignment test case is named using the following convention: $geneName_$exonID_$exonNumber
  $geneName refers to the mouse gene names that come from the UCSC canonical gene set for chr19 genes
  $exonID refers to an internal Evolver identifier
  $exonNumber is the number of the exon in the transcript


For the exon alignments, 
 each alignment additionally lists the number of bases from a the partial codon that is split by the upstream and downstream intron. Those partial codon bases are in lower case.  
 Note that CESAR only aligns individual exons. So to create a cDNA alignment with CESAR, we aligned each exon individually.

For the cDNA alignments, 
 >n0 refers to the input cDNA
 >n?_500 refers to the evolved query genome. Only the part of the genome that fully comprises this gene + 500 bp flanks is shown.
 The spaces in the cDNA correspond to the intronic sequence in the query genome. 
 The first and last exons have a flanking sequence context of 500 bp upstream and downstream, respectively.

 In order to test this benchmark set with another method, extract the lines corresponding to original alignment and strip all the gaps and blank spaces in both the cDNA and the query genome sequence.