Usage

The gbk2fasta script converts fasta files from GenBank format to FASTA format. More information on various genetic sequence formats may be found here.


Command Line:

gbk2fasta is called with the following command line prompt:

$ gbk2fasta <file name>


Arguments:

The following is a list of arguments for gbk2fasta:

<file name>
Name of the file to be converted from GenBank to Fasta format.


Example:

Converting a file named 'file1.fasta':

$ gbk2fasta file1.fasta


Which would convert a file with the GenBank format:

LOCUS       AB000263                 368 bp    mRNA    linear   PRI 05-FEB-1999
DEFINITION  Homo sapiens mRNA for prepro cortistatin like peptide, complete
            cds.
ACCESSION   AB000263
ORIGIN
        1 acaagatgcc attgtccccc ggcctcctgc tgctgctgct ctccggggcc acggccaccg
       61 ctgccctgcc cctggagggt ggccccaccg gccgagacag cgagcatatg caggaagcgg
      121 caggaataag gaaaagcagc ctcctgactt tcctcgcttg gtggtttgag tggacctccc
      181 aggccagtgc cgggcccctc ataggagagg aagctcggga ggtggccagg cggcaggaag
      241 gcgcaccccc ccagcaatcc gcgcgccggg acagaatgcc ctgcaggaac ttcttctgga
      301 agaccttctc ctcctgcaaa taaaacctca cccatgaatg ctcacgcaag tttaattaca
      361 gacctgaa


Into the FASTA format:

>AB000263 |acc=AB000263|descr=Homo sapiens mRNA for prepro cortistatin like peptide, complete cds.|len=368
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCC
CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGC
CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGG
AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCC
CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAG
TTTAATTACAGACCTGAA


Known Bugs

May Remove Data From Input File:

Naming convention for output file names are parsed from input file names, using the first instance of "." as a delimiter. For example, the input file name "9266_1#1.fasta" would be be converted to the output file name "9266_1#1.fasta", and the input file name "GCF_000982775.1_Staphylococcus_aureus_USFL094_Contig.fasta" would result in the output file name "GCF_000982775.fasta"


If the input file name matches the output file name, then this will result in the input file being completely replaced with an empty file of the same file name. This is due to the input file being read and written to at the same time.


Input Files Must Be In GenBank Format:

Input files in any other format other than GenBank will result in an empty output file.