Usage

The renameSamples script can be used to rename various pieces of data in a document, or rename any number of files in a directory, depending on the arguments given. First, the required excel spreadsheet from the imported workbook is read and a hash table is created with keys and values taken from the imported spreadsheet. The renamed files, or renamed data sections are determined by the hash table created from the imported spreadsheet.


Command Line:

renameSamples is called with the following command line prompt:

$ renameSamples [-h] [-c | -r | -a | -f ] [--sheet SHEET] [--old OLD] [--new NEW] BOOK LOCATION


Arguments:

The following is a list of arguments for renameSamples:

[-h, --help]
Displays the help menu for renameSamples before exiting
[-c --column | -r --row | -a --fasta | -f --files ]
Choose to rename column data, row data, fasta contig ids, or files in a directory
[--sheet SHEET]
Optional spreadsheet in workbook
[--old OLD]
Old name for header row in spreadsheet
[--new NEW]
New name for header row in spreadsheet
BOOK
Location of excel xlsx format workbook for hash table, must be xlsx format
LOCATION
Location of file to rename data, or directory to rename files


Example:

Running renameSamples to change the contig ids for fasta file 'file1.fasta' using workbook 'rename.xlsx' to build a rename hash table:

$ renameSamples -a --old Old_Names --new New_Names /directory1/rename.xlsx /directory1/file1.fasta


Would result in the command line output:

====================( Setting Up Job: Rename Samples - Fasta )
143 of 143 lines read (100%)
30 of 30 sections written (100%)
====================( Total Time: 0:0:0:3 )
====================( Job Complete: Rename Samples )


This would create a new file 'Renamed_file1.fasta' in the same directory as the original file 'file1.fasta', with renamed contig id names.


The original fasta file 'file1.fasta' for example may have the following data:

>NZ_CHKO01000053.1 Staphylococcus aureus strain USFL094, whole genome shotgun sequence
TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT
GAGGGAGCAGTGCCAGTCGAAGACAGGGGCCCCAACACAGAAGCTGACATATAGTCAGCTTACAACAATG
...


And the resulting 'Renamed_file1.fasta' file may have the following data based on the hash table created by the 'rename.xlsx' workbook:

>NZ_CHKO01000053.1
TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT
GAGGGAGCAGTGCCAGTCGAAGACAGGGGCCCCAACACAGAAGCTGACATATAGTCAGCTTACAACAATG
...


Column, Row, Fasta, or Files Arguments

The column, row, fasta, and files arguments are required and are mutually exclusive. Only one of these arguments can be active at any given time, and at least one of these arguments must be chosen. These arguments will dictate exactly which action will be taken by the script. For arguments column, row, and fasta, a file must be provided for the location argument. If the files argument is chosen, a directory must be provided for the location argument.


Column:

The column argument will rename data in the first column of a document. The document will be separated into tab delimited columns using '\t' as a delimiter, and for each row, the first column will be renamed.


Renaming the data in the first column of file 'file1.txt' can be done with the following example:

$ renameSamples -c --old Old_Names --new New_Names /directory1/rename.xlsx /directory1/file1.txt


This would create a new file 'Renamed_file1.txt' in the same directory as the original file 'file1.txt', with the first column renamed according to values in the hash table.


The original file 'file1.txt' for example may have the following data:

LocusID Reference         Cg-B6864_4::pre-aligned,pre-called      Cg-B7422_concat_3::pre-aligned,pre-called ...
R265-Contig_1::82238      A       A       A       G       A       A       A       A       A       A       A       A
R265-Contig_1::161872     T       T       T       T       T       T       T       T       T       T       T       T
R265-Contig_10::52985     A       A       A       A       A       A       A       A       A       A       A       A
...


And the resulting 'Renamed_file1.txt' file may have the following data based on the hash table created by the 'rename.xlsx' workbook:

LocusID Reference       Cg-B6864_4::pre-aligned,pre-called       Cg-B7422_concat_3::pre-aligned,pre-called ...
R265_BC_human_VGIIa_2001        A       A       A       G       A       A       A       A       A       A       A       A
R265_BC_human_VGIIa_2001        T       T       T       T       T       T       T       T       T       T       T       T
R265_BC_human_VGIIa_2001        A       A       A       A       A       A       A       A       A       A       A       A
...


Row:

The row argument will rename data in the first row of a document. The first row will be separated into tab delimited sections using '\t' as a delimiter, and each section in the first row will be renamed.


Renaming the data in the first row for file 'file1.txt' can be done with the following example:

$ renameSamples -r --old Old_Names --new New_Names /directory1/rename.xlsx /directory1/file1.txt


This would create a new file 'Renamed_file1.txt' in the same directory as the original file 'file1.txt', with the first row renamed according to the hash table.


The original fasta file 'file1.txt' for example may have the following data:

LocusID Reference         Cg-B6864_4::pre-aligned,pre-called      Cg-B7422_concat_3::pre-aligned,pre-called ...
R265-Contig_1::82238      A       A       A       G       A       A       A       A       A       A       A       A
R265-Contig_1::161872     T       T       T       T       T       T       T       T       T       T       T       T
R265-Contig_10::52985     A       A       A       A       A       A       A       A       A       A       A       A
...


And the resulting 'Renamed_file1.txt' file may have the following data based on the hash table created by the 'rename.xlsx' workbook:

LocusID Reference         B6864_OR_human_VGIIa_2004       B7422_OR_cat_VGIIa_2009 B7436_CA_alpaca_VGIIa_2009 ...
R265-Contig_1::82238      A       A       A       G       A       A       A       A       A       A       A       A
R265-Contig_1::161872     T       T       T       T       T       T       T       T       T       T       T       T
R265-Contig_10::52985     A       A       A       A       A       A       A       A       A       A       A       A
...


Fasta:

The fasta argument will rename the contig display ids for each contig in a given fasta file. 


Renaming the contig ids for fasta file 'file1.fasta' can be done with the following example:

$ renameSamples -a --old Old_Names --new New_Names /directory1/rename.xlsx /directory1/file1.fasta


This would create a new file 'Renamed_file1.fasta' in the same directory as the original file 'file1.fasta', with renamed contig id names.


The original fasta file 'file1.fasta' for example may have the following data:

>NZ_CHKO01000053.1 Staphylococcus aureus strain USFL094, whole genome shotgun sequence
TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT
GAGGGAGCAGTGCCAGTCGAAGACAGGGGCCCCAACACAGAAGCTGACATATAGTCAGCTTACAACAATG
...


And the resulting 'Renamed_file1.fasta' file may have the following data based on the hash table created by the 'rename.xlsx' workbook:

>NZ_CHKO01000053.1
TGTCTTATTTTTTTAAAGTATTTAAAAGTAAAATTACATGTTAATACGTAGTATTAATGGCGAGACTCCT
GAGGGAGCAGTGCCAGTCGAAGACAGGGGCCCCAACACAGAAGCTGACATATAGTCAGCTTACAACAATG
...


Files:

Renaming all of the files in a given directory can be done with the following example:

$ renameSamples -f --old Old_Names --new New_Names /directory1/rename.xlsx /directory1


This would rename all of the files names of each file in a given directory.


The original directory '/directory1' for example may have the following files:

file1.txt
file2.txt
file3.txt
file4.fasta
file5.fasta


And the resulting changes in '/directory1' may produce the following filenames based on the hash table created by the 'rename.xlsx' workbook:

renamed_file1.txt
renamed_file2.txt
renamed_file3.txt
renamed_file4.fasta
renamed_file5.fasta


Sheet Argument (optional)

Sheet:

The argument sheet is used to dictate which spreadsheet will be read in the imported workbook. This argument completely optional. If no argument is given for sheet, then the value for the argument will default to the first spreadsheet in the workbook.


For example, given the following spreadsheet names 'Sheet1' and 'renameSheet' in a workbook:


The sheet 'renameSheet' may be used to build a hash table, and can be called with the following command line:

$ renameSamples -a --sheet renameSheet --old Old_Names --new New_Names /directory1/rename.xlsx /directory1/file1.fasta


Calling the script above with no argument for sheet will would result in the value for 'sheet' to default to 'Sheet1' in this case.


Old & New Name Arguments

Arguments old and new are both required. these arguments are used when building the hash table. Values for both of these arguments are located in the first row of the imported excel spreadsheet in order to determine which columns in the spreadsheet are keys and which columns are values for the hash table.


Old Name:

The old argument is required and allows multiple values. For each value appended to old, a key column in the imported excel spreadsheet will be attributed to keys in the hash table.


The following is an example of running the renameSamples script with only one old argument:

$ renameSamples -a --sheet renameSheet --old Old_Names --new New_Names /directory1/rename.xlsx /directory1/file1.fasta


The following is an example of running the renameSamples script with multiple old arguments:

$ renameSamples -a --sheet renameSheet --old Old_Names --old More_Old_Names --old OLDNAMES --new New_Names /directory1/rename.xlsx /directory1/file1.fasta


New Name:

The new argument is required and only allows one argument assignment. Given a value for new, a key column in the imported excel spreadsheet will be attributed to values in the hash table. Running the renameSamples script with the new argument can be seen in the examples above.


Book Argument

Excel xlsx Format Workbook:

The book argument is required and provides the location of the excel workbook to be imported into the script. The provided workbook must be in xlsx format.


If the workbook is in xls format, then this is an older excel workbook and is not supported. You must first open the workbook in Microsoft Excel and save a new workbook using the Save As... option. You must then choose the xlsx format as seen in the following example:


Hash Table:

The hash table in this script is used to rename various pieces of data in a document, or rename any number of files in a given directory. The book argument is used to import an excel workbook. The optional sheet argument can be used to determine which spreadsheet will be used and If the sheet argument is not provided, then the first spreadsheet in the imported workbook will be used. The old and new arguments determine which columns in the spreadsheet will be used to set the keys and values in the hash table.


For example, if the following spreadsheet were imported into the renamesSamples script:


With old and new arguments set to 'Old_Names' and 'New_Names' in the following command line:

$ renameSamples -a --sheet renameSheet --old Old_Names --new New_Names /directory1/rename.xlsx /directory1/file1.fasta


The resulting hash table would be as follows:

{'Old_Names':'New_Names', 'Cg-B6864':'B6864_OR_human_VGIIa_2004', 'Cg-B7422':'B7422_OR_cat_VGIIa_2009', 'Cg-B7434':'B7434_OR_human_VGIIc_2008', 'Cg-B7436':'B7436_CA_alpaca_VGIIa_2009', 'Cg-B7491':'B7491_OR_human_VGIIc_2009', 'Cg-B7493':'B7493_OR_sheep_VGIIc_2009'}


Location Argument

The location argument is required and provides the location of file to rename data, or directory to rename files. A file must be provided in the location argument in the event any of the column, row, and fasta, arguments are used. A directory must be provided for the location argument if the files argument is used.


File:

The following is an example of row data being renamed in the file 'file1.txt':

$ renameSamples -r --old Old_Names --new New_Names /directory1/rename.xlsx /directory1/file1.fasta


Directory:

The following is an example of the files argument being used to rename files in the directory '/directory1':

$ renameSamples -f --old Old_Names --new New_Names /directory1/rename.xlsx /directory1