Usage
The fastalengthcounter.pl script counts the number of genetic sequence characters in a fasta format file. The count is then printed to the command line, and output into a copy of the fasta file. The copy of the fasta file will be identical to the original file, with the addition of the genetic sequence character count appended to the the first line of the output file.
Command Line:
fastalengthcounter.pl is called with the following command line prompt:
$ fastalengthcounter.pl [<file 1> <file 2> ...] |
Arguments:
The following is a list of arguments for fastalengthcounter.pl:
[<file 1> <file 2> ...] |
List of fasta files to count genetic sequence characters. |
Example:
Running fastalengthcounter.pl on files 'file1.fasta', 'file2.fasta', and 'file3.fasta', with the following command line prompt:
$ fastalengthcounter.pl file1.fasta file2.fasta file3.fasta |
Would result in the command line output:
file1.fasta length 2923488 file2.fasta length 2873800 file3.fasta length 2857593 |
Fasta File Output
When running fastalengthcounter.pl on a file or group of files, an output file is created for each. This output file is identical, except for the first line, which will now have the genetic sequence character count for the fasta file appended to the first line. The extension for the output file also has the string '.length' inserted into it.
Output File Naming Convention:
If the name of the input file has the '.fasta' extension, then the output file extention will be '.length.fasta', otherwise, the output file name will be identical to the input file name, with the extension '.length.fasta' attached to the end.
Running the example above:
$ fastalengthcounter.pl file1.fasta file2.fasta file3.fasta |
Would take the following files:
file1.fasta file2.fasta file3.fasta |
And output 'length' files, resulting in the following file list:
file1.fasta file1.length.fasta file2.fasta file2.length.fasta file3.fasta file3.length.fasta |
Addition of Count to Output File:
As long as the fasta file can be read properly, an output file is created with the genetic sequence character count added to the first line.
Running the example above on a single file 'file1.fasta':
$ fastalengthcounter.pl file1.fasta |
Would read all lines from the file 'file1.fasta':
>NZ_CSJE01000049.1 Staphylococcus aureus strain USFL080, whole genome shotgun sequence TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT GAGGGAGCAGTGCCAGTCGAAGACAGGGGCCCCAACACAGAAGCTGACATATAGTCAGCTTACAACAATG TGCCGGTTGGGGTGGCTGAGACGGCACCCTAGGAAGGGACCCGTCATCAAAAATTCTATTTATAGAATTT TACAGTAATGTGCCAGATGGGCATAGCGAAGCCATTCAATACGAAGTATTGTATAAATAGAGAACAGCAG TAAGATATTTTCTAATTGAAAATTATCTTACTGCTG NZ_CSJE01000048.1 Staphylococcus aureus strain USFL080, whole genome shotgun sequence CCTGCGCTCGCACCAATACGTGTCGCACCTGCTTCAACCATTTTATTGAAATCTTCTAAATTACGTACGC> ... |
And output all lines into the output file 'file1.length.fasta' with the genetic sequence character count for the fasta file appended to the first line:
>NZ_CSJE01000049.1 Staphylococcus aureus strain USFL080, whole genome shotgun sequence, length 2923488 TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT GAGGGAGCAGTGCCAGTCGAAGACAGGGGCCCCAACACAGAAGCTGACATATAGTCAGCTTACAACAATG TGCCGGTTGGGGTGGCTGAGACGGCACCCTAGGAAGGGACCCGTCATCAAAAATTCTATTTATAGAATTT TACAGTAATGTGCCAGATGGGCATAGCGAAGCCATTCAATACGAAGTATTGTATAAATAGAGAACAGCAG TAAGATATTTTCTAATTGAAAATTATCTTACTGCTG NZ_CSJE01000048.1 Staphylococcus aureus strain USFL080, whole genome shotgun sequence CCTGCGCTCGCACCAATACGTGTCGCACCTGCTTCAACCATTTTATTGAAATCTTCTAAATTACGTACGC> ... |