Usage

The fastalengthcounter.pl script counts the number of genetic sequence characters in a fasta format file. The count is then printed to the command line, and output into a copy of the fasta file. The copy of the fasta file will be identical to the original file, with the addition of the genetic sequence character count appended to the the first line of the output file.


Command Line:

fastalengthcounter.pl is called with the following command line prompt:

$ fastalengthcounter.pl [<file 1> <file 2> ...]


Arguments:

The following is a list of arguments for fastalengthcounter.pl:

[<file 1> <file 2> ...]
List of fasta files to count genetic sequence characters.


Example:

Running fastalengthcounter.pl on files 'file1.fasta', 'file2.fasta', and 'file3.fasta', with the following command line prompt:

$ fastalengthcounter.pl file1.fasta file2.fasta file3.fasta


Would result in the command line output:

file1.fasta
length 2923488
file2.fasta
length 2873800
file3.fasta
length 2857593


Fasta File Output

When running fastalengthcounter.pl on a file or group of files, an output file is created for each. This output file is identical, except for the first line, which will now have the genetic sequence character count for the fasta file appended to the first line. The extension for the output file also has the string '.length' inserted into it.


Output File Naming Convention:

If the name of the input file has the '.fasta' extension, then the output file extention will be '.length.fasta', otherwise, the output file name will be identical to the input file name, with the extension '.length.fasta' attached to the end.


Running the example above:

$ fastalengthcounter.pl file1.fasta file2.fasta file3.fasta


Would take the following files:

file1.fasta
file2.fasta
file3.fasta


And output 'length' files, resulting in the following file list:

file1.fasta
file1.length.fasta
file2.fasta
file2.length.fasta
file3.fasta
file3.length.fasta


Addition of Count to Output File:

As long as the fasta file can be read properly, an output file is created with the genetic sequence character count added to the first line.


Running the example above on a single file 'file1.fasta':

$ fastalengthcounter.pl file1.fasta


Would read all lines from the file 'file1.fasta':

>NZ_CSJE01000049.1 Staphylococcus aureus strain USFL080, whole genome shotgun sequence
TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT
GAGGGAGCAGTGCCAGTCGAAGACAGGGGCCCCAACACAGAAGCTGACATATAGTCAGCTTACAACAATG
TGCCGGTTGGGGTGGCTGAGACGGCACCCTAGGAAGGGACCCGTCATCAAAAATTCTATTTATAGAATTT
TACAGTAATGTGCCAGATGGGCATAGCGAAGCCATTCAATACGAAGTATTGTATAAATAGAGAACAGCAG
TAAGATATTTTCTAATTGAAAATTATCTTACTGCTG
NZ_CSJE01000048.1 Staphylococcus aureus strain USFL080, whole genome shotgun sequence
CCTGCGCTCGCACCAATACGTGTCGCACCTGCTTCAACCATTTTATTGAAATCTTCTAAATTACGTACGC>
...


And output all lines into the output file 'file1.length.fasta' with the genetic sequence character count for the fasta file appended to the first line:

>NZ_CSJE01000049.1 Staphylococcus aureus strain USFL080, whole genome shotgun sequence, length 2923488
TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT
GAGGGAGCAGTGCCAGTCGAAGACAGGGGCCCCAACACAGAAGCTGACATATAGTCAGCTTACAACAATG
TGCCGGTTGGGGTGGCTGAGACGGCACCCTAGGAAGGGACCCGTCATCAAAAATTCTATTTATAGAATTT
TACAGTAATGTGCCAGATGGGCATAGCGAAGCCATTCAATACGAAGTATTGTATAAATAGAGAACAGCAG
TAAGATATTTTCTAATTGAAAATTATCTTACTGCTG
NZ_CSJE01000048.1 Staphylococcus aureus strain USFL080, whole genome shotgun sequence
CCTGCGCTCGCACCAATACGTGTCGCACCTGCTTCAACCATTTTATTGAAATCTTCTAAATTACGTACGC>
...