aspen scratch bin scratch/bin : TGen North Bioinformatics Solutions

This page is a quick catalogue of the scripts in the Aspen clusters tnorth scratch bin folder. Many descriptions are subject to change and eventually these scripts will have individual pages documenting their use and purpose. Some descriptions may be only partial and most will require future revisions. Some more complex scripts may receive their own individual help page in the future, if you would like a more specific help page to be created anytime soon please contact the bioinformatics department.

Github: https://github.com/TGenNorth/TGen-North-Scratch-Bin

Paths: /packages/tnorth/bin OR /scratch/bin

Script File/Folder Name	General Description
/OLD/<script>	Sub folder. Contains scripts too old to be useable. If a script found here does not have a newer alternative in scratch/bin and if a newer version is needed then please make a request otherwise these files will remain deprecated.
/Pyed-piper/<script>	Sub folder. Contains python based edirect scripts. Many of these replace old Perl scripts that fulfill the same purpose.
deleteFiles.csh	C Shell Script. Given a folder it deletes files and folders that within that are bwamem, bowtie2, novo, snap, xml, spades, pilon, trimmomatic, or "working_directory"
g3-iterated.csh	C Shell Script. Runs Glimmer3 on a given genome file, also takes in a tag to use for the output file naming.
replaceScripts.csh	C Shell Script. Looks across a folder for spades soft links and creates a replacement for each.
addSequenceDescriptions	Perl Script. Takes in fasta and outputs a modified version with added descriptions if GAS emm type or Borrelia is present.
append2xls	Perl Script. Takes in a tsv file, an xml file and optionally a name for the worksheet made. The tsv should be 2 columns of key/sample and value. The key is used to determine where in the xml to append the associated value data.
assembleUnmappedReads	Bash Script. Creates an assembly, for each bam in the current directory, from all the unaligned segments.
canSnp	Python 3 Script. This script was requested by Jolene Bowers and intern Stephanie Casey for a simplified version of one of Jason Sahl's scripts. They were comparing trees and wanted to identify SNP where a pair of clades split. Takes in a tsv file for a snp matrix and 2 other tsvs for groups 1 and 2 which should have each sample name on a separate line.
combineRuns	Bash Script. Combines all R1 fastq.gz files in the current directory.
countReads	Bash Script. Takes as arguments either *.fastq (all files in directory) or can be supplied with fastq file list. Counts all reads in given files and displays the count for each file to the console screen.
countReads_faster	Bash Script. Counts all reads in all fastq files in the current directory. Output is displayed via individual lines printed to the console screen in the format of sample name, tab, and then the count.
demultiplex	Bash Script. Purpose: "Demultiplex a run from an Illumina sequencer. Works for all Illumina platforms." Converts the bcl files found under the set run folder into fastq.gz files in the output directory, which will be created if it doesn't exist. You can include any additional parameters to bcl2fastq at the end. The job will be submitted to the job queueing system. Arguments/Flag Options: -h, --help => Prints usage message -r, --runfolder_dir => Path to run folder directory -o, --output_dir => Path to demultiplexed output -s, --sample_sheet => Path to sample sheet -l, --lane_splitting => Whether to split FASTq files by lane, should be 0 for NextSeq, 1 for all others -d, --debug_level => Minimum log level, recognized values: NONE,FATAL,ERROR,WARNING,INFO,DEBUG,TRACE -p, --partition => SLURM partition to use, recognized values: gpu, defq
covid_demultiplex	Bash Script. Mimics the code found in demultiplex. Differences to demultiplex: covid_demultiplex adds in an extra job after the normal demultiplex which is to run the covid pipeline script (/labs/COVIDseq/COVIDpoint/post_demux_pipeline/pipeline.sh)
deleteFiles_new	Bash Script. Searches current dirrectory for files related to bwamem, bowtie2, gatk, etc. Deletes all files matching types like .tsv, .csv, .bam, .sorted, and other similar types.
deleteFiles_user	Bash Script. Searches the users scratch folder for files related to bwamem, bowtie2, gatk, etc. Deletes all related files such as the files types of .tsv, .bam, .sorted and so on.
dirtyDip	Bash Script. Mimics the dirtySpades script. Differences: The key difference is that dirtyDip uses the slurm based sbatch job system to handle tasks while dirtySpades uses the qsub command.
dirtySpades	Bash Script. Run dirtyDip without arguments; it will automatically search for fastq files in the current directory. Advanced Options are available if you need to override the defaults. See -h or --help Synopsis: dirtyDip will create an assembly fasta for each fastq sample in the current directory. The fastqs are trimmed with Trimmomatic and assembled with Spades. The assembly will be output to a file named ./<SAMPLE_NAME>.spades/contigs.fasta. A symbolic link to the assembly, ./<SAMPLE_NAME>.fasta, is created for convenience. Unless the --single flag is set, the script will attempt to pair the fastqs assuming they use one of the following naming conventions: <SAMPLE_NAME>_GTGTCCTT-GTGTCCTT_L007_R1_001.fastq.gz <SAMPLE_NAME>_GTGTCCTT-GTGTCCTT_L007_R2_001.fastq.gz <SAMPLE_NAME>_S23_L001_R1_001.fastq.gz <SAMPLE_NAME>_S23_L001_R2_001.fastq.gz If the --single flag is used, each read is assembled separately. The filename is used as the SAMPLE_NAME. This script will not detect read files that do not match the expression: R1.fastq* (this includes not detecting *.fq files).
download_ncbi_set	Bash Script. Given a tab-delimited file of user chosen names (i.e. M013) and NCBI ids (i.e. NC_016928.1) this script will download all NCBI ids from the nucleotide database using ncbi_entrez.py into the user chosen names as fasta files. USAGE: download_ncbi_set <tab delim id file>
downloadSRA	Bash Script. Downloads a list of accession numbers from SRA. Should be SRP# or SRR# Usage: -i <input file> => Specifies the input file. This should be a text file of accession numbers, one per line. -o <output directory> => Specifies the output directory. All read files will be downloaded to this directory. -h => Displays help message and exits. Example: downloadSRA -i samples.txt -o /scratch/dlemmer/SRA/
extractBams	Perl Script. Usage: extractBams dir_to_process results_dir Synopsis: Uses two existing directories, the dir_to_process is searched for subdirectories and files wherein bam file info will be extracted and saved to the results_dir (which is NOT created if it does not exist already). The following directories will be ignored if found: read_metrics, sai, bam_link_unique, bamcoverage_unique_1, bamcoverage_unique_10, bamcoverage_unique_noINDEL_1, bamcoverage_unique_noINDEL_10, SolSNP1, SolSNP10
fastaStats	Perl Script. Displays the following info gathered from a fasta file: Filename, Total contigs, Total nt, Mean length, Median length, Mode length, Max length, Min length, Length of each sequence
fastqStats	Perl Script. Displays the following info gathered from a fastq file: Filename, Total contigs, Total nt, Mean length, Median length, Mode length, Max length, Min length, Length of each sequence
fixFasta	Perl Script. Takes in a fasta file as only argument. Script must be edited to alter how sequences found will be 'fixed' and currently is set to do nothing.
gbk2fasta	Perl Script. Takes in a genbank file as only argument. Converts the file to fasta and then writes out the file using the original file name to the same directory.
gcg2fasta	Perl Script. Usage: gcg2fasta <file_or_directory_of_gcgs> <output_fasta> Converts the gcg or sds file/files into fasta and then writes out to the given output.
generateGTF	Perl Script. Uses a given fasta file to generate a gtf file.
getFlankingSequence	Python 3 script. Given a reference, contig name, and position, extract n bases of flanking sequence on each side of the position Flags: -r, --reference => Required. Reference fasta to extract from -o, --out => Required. Output fasta file to write -i, --input => Required. Input file listing contig::position within reference -f, --flank => Optional. Number of bases of flanking sequence to return. Default=500
getNumReads	Bash Script. Search for all .bam files found in current directory and displays info line by line in tab separated order of: name, reads, mapped
getReadLength	Bash Script. Search current directory for all files, expects fasta or fastq and will display a message for any non-fasta/fastq files looked at. Otherwise the script prints the length of all found sequences.
getVelvetInfo	Perl Script. Usage: getVelvetInfo <dir_to_process> Looks within directory and process all "*_logfile.txt" files from Velvet within and creates a assembly_details.txt file which contains the following columns of tab separated data: Sample, Chosen kmer, Num contigs, N50, Longest contig, Total bases in contigs
initDemultiplex	Perl Script. Generate the samplesheet(s) and a script to demultiplex a HiSeq or GAIIx sequencing run. Usage: initDemultiplex [options] Options: 'runlog' => The sequencing log spreadsheet [REQUIRED] 'runid' => The run ID, i.e. 'HiSeq0010' [REQUIRED] 'fcid' => The flowcell ID [REQUIRED] 'indexconverter' => The IndexConverter spreadsheet [REQUIRED] Optional: 'control' => Whether a control was used, presence of this option means 'Y' 'operator' => Initials of who started the run, default is 'HQ' 'recipe' => Recipe used, default is '101x101withMP' 'initials' => Initials of the person running the script, will be used in SampleSheet filename 'extracycle' => Whether the indexing reads run an extra cycle, default is false 'help' => Print the help message
iqtree	Shell with Jenkins. See http://www.iqtree.org/doc/Tutorial
Jenkinsfile	A pipeline descriptor Jenkins file. Set for determining if shell status is normal or unsatable.
jobstats	Python 3 script. jobstats prints a summary table similar to the following command: sacct -o jobid,jobname,reqmem,maxrss,reqcpus,usercpu,timelimit,elapsed,state In addition, it includes a 'resource efficiency' report showing how well the allocated resources matched the used resources. Usage: python jobstats.py [optional additional sacct args]
link_files	Bash Script. Usage: link_files <txt file list> <place to begin search> Searches based on the criteria and performs "ln -s..." to create links within the current directory.
make_gene_mlst_db	Bash Script. Usage: make_gene_mlst_db <analysis type: gene or mslst> <organism i.e. saureus> For gene mode: creates a blast database for each gene (all within the current directory) For mlst mode: if not already downloaded, downloads MLST data from pubmlst for the given organism. Extracts the first alleles into separate files, creates a blast database for each housekeeping gene (all within the current directory)
make_MLST_db	Bash Script. Usage: make_MLST_db <organism i.e. saureus> <path to mlst gene and profile files> Downloads MLST data from pubmlst for the given organism, extracts the first alleles into separate files, and creates a blast database for each housekeeping gene (all within the current directory)
mash_screen	Slurm sbatch bash script. Usage: sbatch mash_screen Creates a slurm job array which takes all *.fastq.gz in current directory and calls the mash script (See https://mash.readthedocs.io/en/latest/index.html).
mergeReads	Bash Script. Looks for all fastq/fq files and uses PEAR plus the pbs job manager to handle read merging.
mksquashfs	Non-TGen Compiled Code. https://www.mankier.com/1/mksquashfs#Synopsis
mlst	Perl Script. Requires BLAT and gzip. Downloads PubMLST data and then uses the scheme set to create a output csv. Usage: mlst [options] --scheme XXX <contigs.fasta> ... CMD Options: help, verbose, datadir, list, longlist, scheme, noheader, csv, nopath
mlst-download_pub_mlst	Bash Script. Performs and handles the data for the following cmd call: wget 'http://pubmlst.org/data/dbases.xml' --no-check-certificate rm -rf mlst_db/*
modifyFasta	Bash Script. Usage: modifyFasta <fasta_file> Modifies the fasta file, set to edit the name to replace '-' with '_'
nasp_matrix_to_fasta	Perl Script. Converts a NASP snp matrix file into a fasta file Usage: Copy the script to your local machine .. Then on the command line type.. nasp_matrix_to_fasta <source_file> <output_file> <max_seq_length> Inputs- source_file - a tab delimited snp matrix file as output by NASP output_file - the name of the fasta file to output max_seq_length - this is the max length for each line in a sequence. A line break is inserted at that point. Defaults to 60. Output- fasta file (all snps for that organism concatenated together)
neben	Non-TGen Compiled Code. Unknown TODO
new_script	Bash Script [Non-TGen]. Usage: new_script [-h\|--help] [-q\|--quiet] [-s\|--root] [script] License: new_script - Bash shell script template generator Copyright 2012, William Shotts <bshotts@users.sourceforge.net> This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License at <http://www.gnu.org/licenses/> for more details. Revision history: 2014-03-20 Corrected bug in insert_help_message() discovered by Lev Gorenstein <lev@ledorub.poxod.com> (3.3) 2014-01-21 Minor formatting corrections (3.2) 2014-01-12 Various cleanups (3.1) 2012-05-14 Created
paired_haplotype_smor.pl	Perl Script. Usage: paired_haplotype_smor.pl <inputfile.bam> Using the input bam file this script checks the sequences inside against a list of specific positions: eisPlus1000, gyrAPlus1000, inhAPlus1000, katGPlus1000, rpoBPlus1000, and rrsPlus1000. It outputs by printing out a tab separated table with the following headers: Chromosome, Pos \| #AA, #CC, #GG, #TT, #Hom, \| %AA, %CC, %GG, %TT, \| #A, #C, #G, #T, #Cov, \| %A, %C, %G, %T, \| Raw
paired_haplotype_smor_ASAP.pl	Perl Script. Usage: paired_haplotype_smor_ASAP.pl <inputfile.bam> Works just like paired_haplotype_smor.pl, but has altered the position names to just be eis, gyrA, and so on (ie without the "Plus1000" text).
pbs_header.sh	Bash Script. Version 1.1 Written by Joshua Colvin Usage: Takes two or three arguments: - Directory command should be executed in. - Command to execute. Note that it is important to surround entire command with quotes so that pipes and redirects are handled properly. - If optional third argument is present, it contains the name of the file to create if the program fails. (make sure file doesn't exist before running) Example: echo_and_exec $PBS_O_WORKDIR "wc -l .fastq > fastq_line_count.txt" This script mainly contains a single function called echo_and_exec. This function simply has the commands be both executed via the pbs system as well as having the original cmd printed to cmd screen. Since Aspen no longer uses PBS this script is likely DEPRECATED*
pear_unmapped	Bash Script. Usage: Without arguments, the script will print a SLURM sbatch template will submit a job for each .bam file in the directory. To run the template you can either: 1. pipe the template directly to sbatch: ./pear_unmapped \| sbatch 2. save the template to a file and then submit to sbatch: ./pear_unmapped > any_filename Then: sbatch any_filename
plasmidOrGenome	Bash Script. Usage: ./plasmidOrGenome <input_fasta> Creates an output file that is <input name>.pog.txt which contains the following tab separated headers and then lines of tab separated data: contig name, contig length, plasmid hit ratio, top hit length, top hit, determination.
pmi_demultiplex	Bash Script. Help Msg: ./pmi_demultiplex -h Demultiplex a run from an Illumina sequencer. Works for all Illumina platforms. Usage: ./pmi_demultiplex [OPTIONS] [ADDITIONAL OPTIONS PASSED DIRECTLY TO bcl2fastq] Options: -r \| --runfolder_dir => Path to run folder directory (default: ./ ) -o\|--output_dir => Path to demultiplexed output (default: ./Data/Intensities/Basecalls/) -s\|--sample_sheet => Path to sample sheet (default: ./SampleSheet.csv/) -l \| --lane_splitting => Whether to split FASTq files by lane, should be 0 for NextSeq, 1 for all others (default: 0) -d \| --debug_level => Minimum log level, recognized values: NONE,FATAL,ERROR,WARNING,INFO,DEBUG,TRACE (default: WARNING) -p \| --partition => SLURM partition to use, recognized values: gpu, defq (default: defq)
prep_reads_with_numbers.py	Python Script. Usage: python prep_reads_with_numbers.py <read 1 file> <read 2 file> <new read 1 file> <new read 2 file> This script takes in 2 read files and appends "-1" to all of read 1 names and "-2" to all of the read 2 files seq names and then these new versions are saved at the given new file locations for read 1 and 2.
primerDimer	Python Script. Usage: python primerDimer -h python primerDimer [-s, --sort] FASTA1 FASTA2 This script takes in 2 fasta files and prints out info such as interaction count, the primers score, a total sum of scores, bond strength and then finally which of the 2 will have greater coverage.
Pull_clusters.py	Usage: python Pull_clusters.py <cluster_names> <consensus_file> <outputfile> This script pulls clusters out of consensus.seqs to align reads using the list of cluster names given on the fasta/consensus file given and the results are then written to the output file.
pull_contig_from_blat	Perl Script. Usage: pull_contig_from_blat <fasta_file> Creates a temporary file and then searches the display ids for "21863 217408 2870903 821+,...,3456+" and writes those to the temp file, and then finally overwrites the main file with the temp file.
Pull_full_hit_noSplice_fromPSL.py	Python Script. Usage: python Pull_full_hit_noSplice_fromPSL.py <gene_seqs> <psl_file> <fasta_file> <output_destination/file_name> Gene sequences will only be identified by the first thing in the fasta header. Only works on nucleotide BLATs!!! This script identifies if a gene seq given is a full hit within the fasta file or psl file and then writes the results to the given output fasta file.
Pull_gene_sequence_fromPSL_v3.py	Python Script. Usage: python Pull_gene_sequence_fromPSL_v3 .py <gene_seqs> <psl_file> <fasta_file> <output_destination/file_name> Works just like Pull_full_hit_noSplice_fromPSL.py, but allows for splicing when determining gene sequence matches.
q.pl	Perl Script. This script takes in a single argument which is to be a cmd to execute on the PBS job manager. Since Aspen no longer uses PBS this script is likely DEPRECATED.
qdel-range	Shell Script. This script kills a range or all of PBS/Torque jobs owned by the current user. Since Aspen no longer uses PBS this script is likely DEPRECATED.
qrls-range	Perl Script. This script kills a range or all of PBS/Torque jobs that possess a job id within the given range (inclusive) Since Aspen no longer uses PBS this script is likely DEPRECATED.
removeShortContigs	Perl Script. Usage: removeShortContigs <assembly.fasta> <cutoff_size> Combs through the fasta and deletes any contigs which fall below the given size, prints out the number removed.
rename	Perl Script (Non-TGen). Usage: rename [-v] [-n] [-f] perlexpr [filenames] Renames the filenames supplied according to the rule specified as the first argument. The perlexpr argument is a Perl expression which is expected to modify the $_ string in Perl for at least some of the filenames specified. If a given filename is not modified by the expression, it will not be renamed. If no filenames are given on the command line, filenames will be read via standard input. Example: To rename all files matching .bak to strip the extension, you might say rename 's/\.bak$//' .bak To translate uppercase names to lower, you'd use rename 'y/A-Z/a-z/' * This script was developed by Robin Barker (Robin.Barker@npl.co.uk), from Larry Wall's original script eg/rename from the perl source. This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
renameContigs	Perl Script. Usage: renameContigs <dir_to_process> Searches through the directory given for all .fasta and .fa files and then alters their sequence names to have their display id appended to the end separated by an underscore.
renameContigs2	Perl Script. Usage: renameContigs <dir_to_process> Searches through the directory given for all .fasta and .fa files and then alters their sequence names to be just their sample name.
renameSamples	Python Script. Usage: python renameSamples [-h, --help] [-c --column \| -r --row \| -a --fasta \| -f --files ] [--sheet SHEET] [--old OLD] [--new NEW] <BOOK> <LOCATION> This script can be set to rename various parts of files using a provided work book to determine what data to change to what. The book and the location of the files to change are required. The --old and --new flags are also required and set the old headers and the new headers that are to be used. The flags -c, -r, -f when set decide if the script is to edit those sections of the file. Lastly --sheet is optional and determines which sheet in the workbook to use, otherwise it defaults to the first sheet.
renameSamples_column.pl	Perl Script. Usage: renameSamples_column.pl <file to alter> Renames the samples in the first column of the input file
renameSamples_fasta.pl	Perl Script. Usage: renameSamples_fasta.pl <fasta file to alter> Renames the samples in the fasta file that is passed in
renameSamples_files.pl	Perl Script. Usage: renameSamples_files.pl <path to directory containing files> Renames the files in the given directory
renameSamples_row.pl	Perl Script. Usage: renameSamples_row.pl <SNP pipeline results file to alter> Renames the samples in the first line of the SNP pipeline results file
replaceScript and replaceScript.csh	Bash Script. Fixes and replaces Spade links, currently send to make a test output at /scratch/dlemmer/replace_spades_links.txt
reSeq_analysis.sh	Bash Script. Usage Example: ./reSeq_analysis.sh --input-directory /scratch/TGenNextGen/TGN-MiSeq0123/ProjectMayhem/ --output-directory ./ --genome-size 1.2 --read-length 250 --min-coverage 20 --min-gc 42 authors: Kristin Wiggins <kwiggins@tgen.org> and Jason Travis <jtravis@tgen.org> url: https://github.com/TGenNorth/tnorth (restricted access) Overview: Anyone is welcome to use this script when quickly analyzing any whole genome sequence data. There is a special feature to demultiplex the data before the quick analysis, which requires special permissions, but the rest of the script will work for everyone! This should only be used as a quick look at the data and should not be used in place of detailed analysis techniques. Pipeline Synopsis: Automated quality checks on sequence data with the final output being two separate files of "Passed QC" and "Failed QC" with their associated metrics of GC content, average Phred Score, and Quick Coverage estimates. Optional- Demultiplex data first; create a new directory of passed files.
reverseComplement	Python Script. Usage: python reverseComplement [flag] <fasta, txt file, or string> Flags: -f, --fasta => Reverse complement all sequences in the fasta file, output new fasta file -t, --text, => Reverse complement all sequences in the text file, output new text file -s, --string => Reverse complement the passed string, output to STDOUT
RStudioServer	Bash/Sbatch Slurm Script. Usage: sbatch RStudioServer Loads singularity 3.3.0 and then: "Starts RStudio Server on the cluster. Please run using sbatch. After starting, "cat rstudio-server.job.{slurmJobID}" for details"
run_bwa.pl	Perl Script. Usage: run_bwa.pl -help Performs read alignments across multiple references and calculates coverage stats.. runs bwa, solsnp, bam_coverage, read metrics and SnpPipeline-0.4.jar Options: -alignment <type of alignment: single/paired> -analysis <type of analysis: gene/full/both/bamcov/none> -reference <comma separated list of reference prefixes> -organism <string> -p <path to sequence folder> -snp_pipe <to run or not to run SnpPipeline-0.4.jar. Values: y/n> -ext_p <path to external fastas> -aln <memory needed to run bwa and picard. Format: integer followed by kb/mb/gb recommended minimum:4G> WILLOW ONLY -bamcov <memory needed to run solsnp1 and solsnp1. Format: integer followed by kb/mb/gb recommended minimum:5G> WILLOW ONLY -covcalc <memory needed to run gff_intersection.py and bamCovCalc.pl. Format: integer followed by kb/mb/gb recommended minimum:2G > WILLOW ONLY -snppipeline <memory needed to run SnpPipeline-0.4.jar, snpfixer_040512.php and snp_matrix_to_fasta.v2.pl. Format: integer followed by kb/mb/gb recommended minimum:5G > WILLOW ONLY
run_bwa_1xcov.pl	Works just like run_bwa.pl, but has been edited to use /media/lumberyard/bin/bwa_match_auto_040512.pbs and ~${username}/lumberyard/bin/generic.pbs. This script also focuses on fastq files rather than the more compressed fastq.gz that was default in run_bwa.pl
run_bwa_040512.pl	Works just like run_bwa.pl, but has been edited to use different defaults, do_all_reads is true, memory required for all unique is 2gb instead of 1gb. Uses for pbs: /media/lumberyard/bin/bwa_match_auto_040512.pbs
run_bwa_040512_novoalign.pl	Works just like run_bwa.pl, but has been edited to perform similarly to run_bwa_040512.pl, but with added things to the pbs job to call novoalign.
run_bwa_072611_noMarkDup.pl	Works just like run_bwa.pl, but has been edited to search through a sequences directory and make calls, ignoring duplicates.
run_bwa_callreference.pl	Works just like run_bwa.pl, but has been edited to call /media/lumberyard/bin/bwa_match_auto_callreference.pbs to handle PBS commands
SCCmecType.pl	Perl Script Usage: SCCmecType.pl <assembly_file>\|<directory> <output_file> Performs one of 2 functions depending on args used. If only 1 arg is provided it assumes the first case of it being an assembly file and proceeds to get the product and output for each primer set found within, uses a unchangeable output with name based on the original file. The second operation is performed when 2 args are given, directory and output. This second operation uses the files found in the provided directory to collect data which it then writes to the outfile in a tsv table of data with the headers: Sample, ccr type, mec class, SCCmec Type
screenDir	Bash Script Executes: sbatch --array=1-$(ls -1 R1.fastq.gz \| wc -l) /scratch/bin/mash_screen.slurm
scriptseq_adapter_seqs.fasta	Fasta File Contains: >Read1_over_read AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC >Read2_over_read AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGG
sdsi_pipeline_run	Bash Script Usage: scriptTemplate [-s\|-o\|-r\|-p\|-h] Options: -s \| --sequence_directory => Sequence Directory -o \| --output_directory => Output Directory -r \| --reference_sdsi => SDSI Reference File -p \| --adapter_file => Adapter File -h \| --help => Print Help. Loads a Anaconda environment as well as calls Snakemake in order to execute a sdsi pipeline which then prompts the user to make any other necessary decisions and then calls /scratch/cridenour/Projects/SDSI/SDSIVisuals/SDSiVisuals/workflow/scripts/check_if_sdsi.py Author: Chris Ridenour
SEdirtySpadesNoTrim	Bash Script When run checks current directory for fastq files and then uses PBS to run a spades job on each file, set to be non-trimming.
select_subsampling.pl	Perl Script Usage: perl select_subsampling.pl <fastafile> <numsnpstokeep> <numiterations> <outputfolder> <independent\|linked> <isolate1> [isolate2] [isolate3]... Opens and searches through the fasta file given for snps which meet the criteria. Then the system randomly keeps some of them based on what numsnpstokeep is set to. These randomly kept set is then saved into a handful of files which get saved at the set output folder.
separateBSRResults	Perl Script Usage: perl separateBSRResults [help \| filter] <file> Takes in a file and an optionally set filter value. The script uses the filter value, default 0.8, and removes any lines in the file with greater value. The script assumes the file to contain tab separated data, with part of that data being a number which it then takes the first of to use for comparison.
sequencingQC	Bash Script Usage Example: ./sequencingQC --input-directory /scratch/TGenNextGen/TGN-MiSeq0123/ProjectMayhem --output-directory ./ --genome-size 1.2 --read-length 250 --min-coverage 20 --min-gc 42 Anyone is welcome to use this script when quickly analyzing any whole genome sequence data. There is a special feature to demultiplex the data before the quick analysis, which requires special permissions, but the rest of the script will work for everyone! This should only be used as a quick look at the data and should not be used in place of detailed analysis techniques. Pipeline Synopsis: Automated quality checks on sequence data with the final output being two separate files of "Passed QC" and "Failed QC" with their associated metrics of GC content, average Phred Score, and Quick Coverage estimates. Optional- Demultiplex data first; create a new directory of passed files. All functions assume they are called from the output_directory
shuffleSequences_fastq.pl	Perl Script Usage: perl shuffleSequences_fastq.pl <fastq A> <fastq B> <fastq file to output> This script outputs the contents of A and then B (4 then 4) repeating into the set output file.
single_read_error_tb.pl	Perl Script Usage: perl single_read_error_tb.pl <inputfile.bam> <min_coverage> Prints results to strout (cmd screen). This script checks the given bam file and outputs the following data under these headers (tab seperated): Chromosome, Min proportion non-most-frequent call (excl), Max proportion non-most-frequent call (incl), Position count, All This script uses position ranges: eisPlus1000, gyrAPlus1000, inhAPlus1000, katGPlus1000, rpoBPlus1000, rrsPlus1000
single_read_error_wg.pl	Perl Script Usage: perl single_read_error_wg.pl <inputfile.bam> <min_coverage> Works just like single_read_error_tb.pl except it does not use/include the position ranges found in the tb version.
sjobs	Contains the following text: sacct -o jobid,jobname,reqmem,maxrss,reqcpus,usercpu,timelimit,elapsed,state
snp_subsampling.pl	Link to Deprecated version of this script, see snp_subsampling2.pl
snp_subsampling2.pl	Perl Script Usage: perl snp_subsampling.pl <fastafile> <numsnpstokeep> <numiterations> <outputfolder> Searches through given fasta file for snps and generates a number of random batches equal to the number of iterations of a size equal to the number to keep, each saved in the output destination under their own naming scheme.
snpdist	Shell Script (Compiled, thus content not human readable) Usage: ./snpdist -h [?] TODO
snpfixer.php	PHP Script (Symbolic Link, real file is in: /nextgen_snp_pipeline/) Usage: php snpfixer.php <smallfile.txt> <largefile.txt> <resultfile> Takes in two '.txt' or '.xls' files that look like this: snp Sample1_ID Sample2_ID Sample3_ID ... SNP1_ID A T T ... SNP2_ID G G T ... SNP3_ID A A A ... ... It will then match up all SNPs that are present in both files. If a call at the same position on the same sample differs between the two, it will be changed to 'N'. Any SNP not present in both files will be discarded. The resulting file will be returned. This tool is useful if you have stringent requirements on what gets considered a valid SNP, but then want a downstream tool not to assume excluded SNPs must match the reference. this version removes all snp positions that have N >0 and/or if the snp position is not bi-allelic
snpfixer_040512.php	Works like snpfixer.php. No code difference from what I can tell.
SnpPipeline.jar	TODO (compiled code, needs to be looked up)
SnpPipeline-0.4.jar	TODO (compiled code, needs to be looked up)
splitFastaSequences	Perl Script Usage: perl splitFastaSequences <fasta_file> Uses BioPerl and re-outputs the fasta file to a temp which then replaces the original. The main benefit of the script would be that it ensures consistent formatting, but doesn't make and alterations to the data.
sra-blastn	TODO Compiled code, needs lookup
sra-tblastn	TODO Compiled code, needs lookup
srst2_mlst	Bash Script Usage: srst2_mlst [-s optional flag for species] DEPRECATED - USES PBS JOB MANAGER Runs the getmlst.py python script and pbs job manager to handle all available fastq files in run directory. Uses the srst2 module on aspen to process the read and species data.
srst2_mult	srst2_mult is the previous version to srst2_multiple.
srst2_multiple	Bash Script Usage: srst2_multiple Works similarly to srst2_mlst but does not allow any arguments to be provided. Uses PBS job manager and the srst2 module on all fastq files in the directory its run in.
srst2_wrapper	Bash Wrapper Script Usage: srst2_wrapper <any args to pass along to srst2.py> Calls python /packages/srst2/0.1.4/scripts/srst2.py and pipes any args to it as if run in place of the wrapper.
staphTyping.pl	Perl Script Usage: perl staphTyping.pl <assembly_file>\|<directory> <output_file> Uses either an assembly file or a directory and output file. This script contains multiple primer sets which are used to perform checks on the inputted data. If just an assembly file is supplied a check for product across all primer sets and outputs results to the screen. If a directory and output file are supplied then the directory will be searched for any files that seem to be assembly files ie fa, fasta, contains 'final_assemby', etc. Each of these files are then checked against the primer sets and their output is tabulated and stored in the output file. The product is calculated by creating a amplicon search using foreword and reverse primers with the assembly data and this search can potentially find a position and PCR product size.
stress	TODO Compiled code lookup
taxid2tsa.pl	Perl Script. Usage: perl taxid2tsa.pl [options] <taxid1> [ <taxid2> ... <taxidN> ] Options: -exclude => Accepts taxids to exclude from the output (default: None) -alias_file => File base name (no extension) to save results into an alias file. -title => Title to include in the generated alias file. Required if alias_file is provided. -url_api_ready => Produce output that can be used in the NCBI URL API (default: false) -verbose, -v => Produce verbose output, can be specified multiple times for increased verbosity (default: false) -help, -? => Displays this man page. Retrieves TSA projects for given NCBI taxonomy IDs AUTHOR: Christiam Camacho (camacho@ncbi.nlm.nih.gov)
taxid2wgs.pl	Perl Script. Usage: perl taxid2wsg.pl [options] <taxid1> [ <taxid2> ... <taxidN> ] Options: -exclude => Accepts taxids to exclude from the output (default: None) -alias_file => File base name (no extension) to save results into an alias file. -title => Title to include in the generated alias file. Required if alias_file is provided. -url_api_ready => Produce output that can be used in the NCBI URL API (default: false) -verbose, -v => Produce verbose output, can be specified multiple times for increased verbosity (default: false) -help, -? => Displays this man page. Retrieves WGS projects for given NCBI taxonomy IDs. AUTHOR: Christiam Camacho (camacho@ncbi.nlm.nih.gov)
tblastn_vdb	TODO Compiled code lookup
transpose_tab_delim.py	Python Script Usage: python transpose_tab_delim.py Reads in a tab deliminated file and outputs a transposed version. Use the --start-line and --end-line options to control the range of lines from a file that are used for the data Options: -d, --debug = display debug messages --log = name of file to write log data to (defaults to STDERR) -i, --in = name of file to read from (defaults to STDIN) -o, --out = name of file to write to (defaults to STDOUT) -s, --start-line the 0-based line number of the first line to use as data (defaults to 0) -e, --end-line = the 0-based line number of the last line to use as data (defaults to last line)
trf	TODO Compiled code lookup
trim_cocci.sh	Simple Shell File Contains: for i in R1; do k=`echo $i \| sed 's/_001.fastq.gz//g'`; j=`echo $i \| sed 's/_R1_/_R2_/g'`; l=`echo $j \| sed 's/_001.fastq.gz//g'`; java -jar /scratch/jsahl/tools/UGAP/bin/trimmomatic-0.30.jar PE -threads 8 $i $j "$k"_trim_paired_1.fastq.gz "$k"_trim_unpaired_1.fastq.gz "$l"_trim_paired_2.fastq.gz "$l"_trim_unpaired_2.fastq.gz ILLUMINACLIP:scriptseq_adapter_seqs.fasta:2:25:10 MINLEN:60; done
trimmomatic-0.30.jar	http://www.usadellab.org/cms/?page=trimmomatic http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf
trimmomatic-0.32.jar	http://www.usadellab.org/cms/?page=trimmomatic http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf
trimReads	Bash Script Runs Trimmomatic on all fastq files in the current directory using slurm sbatch Job: java org.usadellab.trimmomatic.TrimmomaticPE -threads 4 $read1 $read2 ${sample}_R1_paired.fastq.gz ${sample}_R1_unpaired.fastq.gz ${sample}_R2_paired.fastq.gz ${sample}_R2_unpaired.fastq.gz ILLUMINACLIP:/scratch/bin/illumina_adapters_all.fasta:4:30:10:1:true SLIDINGWINDOW:5:20 MINLEN:60
trimReads_notrim	Bash Script Runs Trimmomatic on all fastq files in the current directory using slurm sbatch Job: java -jar /packages/trimmomatic/0.36/trimmomatic-0.36.jar PE -threads 4 $read1 $read2 ${sample}_R1_paired.fastq.gz ${sample}_R1_unpaired.fastq.gz ${sample}_R2_paired.fastq.gz ${sample}_R2_unpaired.fastq.gz HEADCROP:0
trimReads_qsub	Bash Script Runs Trimmomatic on all fastq files in the current directory using PBS job manager Job: java -jar /scratch/bin/trimmomatic-0.32.jar PE -threads 4 $read1 $read2 ${sample}_R1_paired.fastq.gz ${sample}_R1_unpaired.fastq.gz ${sample}_R2_paired.fastq.gz ${sample}_R2_unpaired.fastq.gz ILLUMINACLIP:/scratch/bin/illumina_adapters_all.fasta:2:30:10 SLIDINGWINDOW:5:20 MINLEN:80
trimReadsML60	Bash Script Runs Trimmomatic on all fastq files in the current directory using slurm sbatch Job: java -jar /packages/trimmomatic/0.36/trimmomatic-0.36.jar PE -threads 4 $read1 $read2 ${sample}_R1_paired.fastq.gz ${sample}_R1_unpaired.fastq.gz ${sample}_R2_paired.fastq.gz ${sample}_R2_unpaired.fastq.gz ILLUMINACLIP:/scratch/bin/illumina_adapters_all.fasta:2:25:10:1:true SLIDINGWINDOW:5:20 MINLEN:60
trimReadsML60_baseAdapterFile	Bash Script Runs Trimmomatic on all fastq files in the current directory using slurm sbatch Job: java -jar /packages/trimmomatic/0.36/trimmomatic-0.36.jar PE -threads 4 $read1 $read2 ${sample}_R1_paired.fastq.gz ${sample}_R1_unpaired.fastq.gz ${sample}_R2_paired.fastq.gz ${sample}_R2_unpaired.fastq.gz ILLUMINACLIP:/scratch/bin/illumina_adapters_no_readthrough.fasta:2:25:10 SLIDINGWINDOW:5:20 MINLEN:60
trimReadsnqtrim	Bash Script Runs Trimmomatic on all fastq files in the current directory using slurm sbatch Job: java -jar /scratch/bin/trimmomatic-0.32.jar PE -threads 4 $read1 $read2 ${sample}_R1_paired.fastq.gz ${sample}_R1_unpaired.fastq.gz ${sample}_R2_paired.fastq.gz ${sample}_R2_unpaired.fastq.gz ILLUMINACLIP:/scratch/bin/illumina_adapters_all.fasta:2:25:10 MINLEN:60
Trinity_GG_PASA_Auto.sh	Bash Script Usage: Trinity_GG_PASA_Auto.sh <genome> <reads_1> <reads_2> Uses samtools, gsnap, and the perl script alignReads.pl to align and create sorted coords to then run the perl script Launch_PASA_pipeline.pl
Trinity_GG_PASA_Auto_JG.sh	Bash Script Usage: Trinity_GG_PASA_Auto.sh <genome> <reads_1> <reads_2> Uses samtools, gsnap, and the perl script alignReads.pl to align and create sorted coords to then run the perl script Launch_PASA_pipeline.pl + adds in the use of bowtie
updateModDate	Bash Script A directory (usually a TGenNextGen sub folder) set inside the code will be looped over and have it's mod date changed
updateModDate2	Bash Script Similar to updateModDate, but has only a READFILE variable available to change
upload2pathogen	Bash Script Usage: upload2pathogen <name> Performs: chmod -R o+r $name.html $name chmod o+x $name rsync -r $name.html $name ${USER}@pbc-cpimtab-01.tgen.org:/var/www/pathogen.tgen.org/ASAP
usearch	TODO Compiled code lookup
variantstatistics_v2.1.pl	Perl Script Usage: perl <perl_script> <MINOR VARIANT FILE> <UPPER LIMIT> <INTERVAL> <OUTPUT FILE> <TOTAL ERROR CALLS (y/n)> created by Arun Rawat Version 2.1 Modified on Aug 8th 2012 This version does not generate other files like mean, std dev.
vdb	TODO Compiled code lookup
vdbCreate	TODO Compiled code lookup
velvet_multiple	Bash Script Only handles fastq.gz. Runs VelvetOptimiser.pl on all isolates listed in isolates.txt Note: this script uses and depends on the PBS system Usage: velvet_multiple [options] <isolates.txt> <reads_dir> <starting_kmer> <ending_kmer> [threads_to_use] Note: velveth doesn't seem to abide by ncpus in PBS or threads_to_use (may esplode your computer) Options: -h, --help = display help text and more info -a <date_time> = Declares the time after which the job is eligible for execution. The date_time argument is in the form: [[[[CC]YY]MM]DD]hhmm[.SS]
velvet_multiple_oldschool	Bash Script Only handles fastq.gz. Runs VelvetOptimiser.pl on all isolates listed in isolates.txt Note: this script uses and depends on the PBS system Usage: velvet_multiple [options] <isolates.txt> <reads_dir> <starting_kmer> <ending_kmer> [threads_to_use] Note: velveth doesn't seem to abide by ncpus in PBS or threads_to_use (may esplode your computer) Options: -h, --help = display help text and more info -a <date_time> = Declares the time after which the job is eligible for execution. The date_time argument is in the form: [[[[CC]YY]MM]DD]hhmm[.SS]
which_barcodes_work.py	Python 3 Script Usage: python which_barcodes_work.py [options] <barcode_file_x> <barcode_file_y> Options: -o = output file (default: stdout) -m =min hamming distance (default: 3) --color-balance =treat G:T and A:C as equivalent (recommended)
yield_approximation	Bash Script Script to estimate depth of coverage (x) given a read file Usage: yield_approximation <read file> <read length> <genome size>

TGen North Bioinformatics Solutions

How can we help you today?

Aspen TNorth Scratch Bin Scripts Print

How can we help you today?

Aspen TNorth Scratch Bin Scripts Print

Related Articles