Amino acid sequences are provided in one file per species containing the annotated transcripts using the assembly accession as a name. Transcripts were annotated with CESAR2.0 using the human (hg38) genome as a reference, the 120way-alignment and human ENSEMBL96 annotations. If there was more than one transcripts for a gene with exactly the same annotated coding sequence only the sequence for one of the transcripts is provided. FASTA headers contain the ENSEMBL-TRANSCRIPT-ID|GENE-NAME if the gene name was available from biomart. Amino acid sequences are given in one letter code. Special characters: X: frame-shifting insertion or deletion *: stop codon ?: no aligned sequence, sequence contains assembly gaps, or a low sequencing quality score