Usage

The getReadLength script finds the max character length for the genetic sequence characters on a single line and returns length to the command line.


Command Line:

getReadLength is called with the following command line prompt:

$ getReadLength [<file 1> <file 2> ...]


Arguments:

The following is a list of arguments for getReadLength:

[<file 1> <file 2> ...]
List of fasta/fastq/gz files to count max genetic sequence characters on a single line.


Example:

Running getReadLength on a file named 'file1.fasta':

$ scriptName file1.fasta


If 'file1.fasta' contained the following lines of data:

>NZ_CHKO01000053.1 Staphylococcus aureus strain USFL094, whole genome shotgun sequence
TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT
GAGGGAGCAGTGCCAGTCGAAGACAGGGGCCCCAACACAGAAGCTGACATATAGTCAGCTTACAACAATG
TGCCGGTTGGGGTGGCTGAGACGGCACCCTAGGAAGGGACCCGTCATCAAAAATTCTATTTATAGAATTT
TACAGTAATGTGCCAGATGGGCATAGCGAAGCCATTCAATACGAAGTATTGTATAAATAGAGAACAGCAG
TAAGATATTTTCTAATTGAAAATTATCTTACTGCTG
>NZ_CHKO01000052.1 Staphylococcus aureus strain USFL094, whole genome shotgun sequence
TTCTTAGGCAATGTAAAAAAGCTGATTTCTATTAATTATTTGATAGAAATCAGCTTTTTTGATATGTATT
TTATAATGTACAGCTCGTTGAGCTGCTATTTTCCTTATATTAAGTGCCATTAATACAAAACCTAGCTCTC
...


Would result in the command line output:

file1.fasta    70


Valid Files

The getReadLength script can handle both fasta and fastq file formats. the scripts can also handle zipped fastq files.


Files Must Be Fasta or Fastq:

Files have the following regular expressions:

  • *.fq.gz
  • *.fastq.gz
  • *.fq
  • *.fqs
  • *.fastq
  • *.fastqs
  • *.fa
  • *.fas
  • *.fasta
  • *.fastas


Read Length Gathered From Second Line of File:

Some of the file formats accepted by getReadLength may be different from one another. The line checked for the max length of genetic sequence characters is always pulled from the second line of the import file.


Running the previous example on the file 'file1.fasta':

$ scriptName file1.fasta


Would read the following lines of data:

>NZ_CHKO01000053.1 Staphylococcus aureus strain USFL094, whole genome shotgun sequence
TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT
GAGGGAGCAGTGCCAGTCGAAGACAGGGGCCCCAACACAGAAGCTGACATATAGTCAGCTTACAACAATG
TGCCGGTTGGGGTGGCTGAGACGGCACCCTAGGAAGGGACCCGTCATCAAAAATTCTATTTATAGAATTT
TACAGTAATGTGCCAGATGGGCATAGCGAAGCCATTCAATACGAAGTATTGTATAAATAGAGAACAGCAG
TAAGATATTTTCTAATTGAAAATTATCTTACTGCTG
>NZ_CHKO01000052.1 Staphylococcus aureus strain USFL094, whole genome shotgun sequence
TTCTTAGGCAATGTAAAAAAGCTGATTTCTATTAATTATTTGATAGAAATCAGCTTTTTTGATATGTATT
TTATAATGTACAGCTCGTTGAGCTGCTATTTTCCTTATATTAAGTGCCATTAATACAAAACCTAGCTCTC
...


Then pull only the second line to acquire a genetic sequence character count:

TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT


Resulting in the following command line output:

file1.fasta    70