FASTA Format

FASTA format orignally derives from the FASTA sequence database searching package. FASTA format simply consists of one line of comments beginning with a '>' symbol, followed by any number of lines, of any length, of sequence information. Lines except the last are often limited to 60 characters. Sequences are expected to be represented in the standard amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; N may be used for an unknown nucleic acid residue or X for an unknown amino acid residue. Some sequences in this format might look like:

>18BI1 Human MLC1emb gene for embryonic myosin alkaline light chain, promoter and exon 1 
GTGAAGAGAGAGCTGTGGCATGAAGGGGAGGGGGCTGGTGGCCCCAAACCTGGTGACAA
TACACAGTTGTCAGCTGTACCCTGCTGGCGTTTCTTCCTTTTATAGTCAGCAGCAGTTG
CTCTTGCTTTCACCCAGCCCCTCTGTGGGGCTCCTGCCCAGGATAAAAGGGAAGGGAGG
CAGCCCAGGCTCCTATCTCATCTCCCAGACGCCACGTCTCTCGGTTTCTTCTTAG

>32A5UTR Human alpha-Bcrystallin gene, 5' end 
GTCGACACCACCCAAAATAGTGCCGAGCCTCTTGGGGGGGGAGGGGCTGGGAGTGGGGG
CCCTGAGTGAGAGCAACGAGGGTGTGACCAGCGCCGCCCGGACCCCTAGTCCCCTCCCC
CGCACACTCTTCAGCTGTCGCAGGGGGCCTGAGAGGACAGCTGAGGGTCCTGGCTGGGA
ACGAGCTGGGGAGGGGGAGCTGGTGGTGCCTGGGGCATGAAGAGGCCTCGCTGAGACCC
TCACAAACGGTTTGCACGTTTCCACACCTCATTTTCTCCTCTTCGGTGGCAGGCACTGT
GCACCCAATTCCTAAAGCACTCCTGGATTTAATGTTCTGAGAGCCACATAGAACGAAAG
ATGCAAGAAATCTGTTTGCTCTTTTTTCAGGGGGTGGGGTCTTTCTGCCCAGATGTGGG
ATCCTCTCCTAAACCCAGGTCAACCCAGGGCACGAGGCAGATGGCTGGTGCTGACATGT
TGACCATCACTGCTCTCTTCCAAGGACTCACAAAGAGTTAATGTCCCTGGGGCTCAGCC
TAGGAAGATTCCAGTCCCTGCCCAGGCCCAAGATAGTTGCTGGCCTGATTCCCCTGGCA
TTCAGGACTGGAAAGGAGGAGGAGGGGCACACTACGCCGGCTCCCATCCTCCCCCCACC
CCGCGTGCCTGCTTGGGATTCCTGACTCTGTACCAGCTTCAGAGAACAGGGGTGGGGGT
GGGTGCCATTGGGTGTGGACAGAAAGCTAGTGAAACAAGACCATGACAAGTCACTGGCC
GGCTCAGACGTGTTTGTGTCTCTCTTTTCTTAGCTCAGTGAGTACTGGGTATGTGTCAC
ATTGCCAAATCCCGGATCACAAGTCTCCATGAACTGCTGGTGAGCTAGGATAATAAAAC
CCCTGACATCACCATTCCAGAAGCTTCACAAGACTGCATATATAAGGGGCTGGCTGTAG
CTGCAGCTGAAGGAGCTGACCAGCCAGCT