Usage
The getReadLength script finds the max character length for the genetic sequence characters on a single line and returns length to the command line.
Command Line:
getReadLength is called with the following command line prompt:
$ getReadLength [<file 1> <file 2> ...] |
Arguments:
The following is a list of arguments for getReadLength:
[<file 1> <file 2> ...] |
List of fasta/fastq/gz files to count max genetic sequence characters on a single line. |
Example:
Running getReadLength on a file named 'file1.fasta':
$ scriptName file1.fasta |
If 'file1.fasta' contained the following lines of data:
>NZ_CHKO01000053.1 Staphylococcus aureus strain USFL094, whole genome shotgun sequence TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT GAGGGAGCAGTGCCAGTCGAAGACAGGGGCCCCAACACAGAAGCTGACATATAGTCAGCTTACAACAATG TGCCGGTTGGGGTGGCTGAGACGGCACCCTAGGAAGGGACCCGTCATCAAAAATTCTATTTATAGAATTT TACAGTAATGTGCCAGATGGGCATAGCGAAGCCATTCAATACGAAGTATTGTATAAATAGAGAACAGCAG TAAGATATTTTCTAATTGAAAATTATCTTACTGCTG >NZ_CHKO01000052.1 Staphylococcus aureus strain USFL094, whole genome shotgun sequence TTCTTAGGCAATGTAAAAAAGCTGATTTCTATTAATTATTTGATAGAAATCAGCTTTTTTGATATGTATT TTATAATGTACAGCTCGTTGAGCTGCTATTTTCCTTATATTAAGTGCCATTAATACAAAACCTAGCTCTC ... |
Would result in the command line output:
file1.fasta 70 |
Valid Files
The getReadLength script can handle both fasta and fastq file formats. the scripts can also handle zipped fastq files.
Files Must Be Fasta or Fastq:
Files have the following regular expressions:
- *.fq.gz
- *.fastq.gz
- *.fq
- *.fqs
- *.fastq
- *.fastqs
- *.fa
- *.fas
- *.fasta
- *.fastas
Read Length Gathered From Second Line of File:
Some of the file formats accepted by getReadLength may be different from one another. The line checked for the max length of genetic sequence characters is always pulled from the second line of the import file.
Running the previous example on the file 'file1.fasta':
$ scriptName file1.fasta |
Would read the following lines of data:
>NZ_CHKO01000053.1 Staphylococcus aureus strain USFL094, whole genome shotgun sequence TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT GAGGGAGCAGTGCCAGTCGAAGACAGGGGCCCCAACACAGAAGCTGACATATAGTCAGCTTACAACAATG TGCCGGTTGGGGTGGCTGAGACGGCACCCTAGGAAGGGACCCGTCATCAAAAATTCTATTTATAGAATTT TACAGTAATGTGCCAGATGGGCATAGCGAAGCCATTCAATACGAAGTATTGTATAAATAGAGAACAGCAG TAAGATATTTTCTAATTGAAAATTATCTTACTGCTG >NZ_CHKO01000052.1 Staphylococcus aureus strain USFL094, whole genome shotgun sequence TTCTTAGGCAATGTAAAAAAGCTGATTTCTATTAATTATTTGATAGAAATCAGCTTTTTTGATATGTATT TTATAATGTACAGCTCGTTGAGCTGCTATTTTCCTTATATTAAGTGCCATTAATACAAAACCTAGCTCTC ... |
Then pull only the second line to acquire a genetic sequence character count:
TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT |
Resulting in the following command line output:
file1.fasta 70 |