10. Reference¶
10.1. Command line options for sequescan¶
The arguments for sequescan can be listed by typing:
~/sequedex/bin/sequescan run -h
at the comand line, resulting in the following output:
Command line execution of sequescan run mode:
sequescan run [-h] [-q] [-c config_file] -d data_module [-o output_directory] [-s function_set]
[-a min_prot_frag_length] [-t thread_num] [-f database_writer_flag] [-l] INFILE
Example:
sequescan run -d Life2550-4GB.0 -s seed_0911.m1 -f 1 /Users/jsmith/mgData
Option descriptions:
-h mode help
-q quiet option - less messages to console or progress window
-c user-defined configuration file (overrides system configuation file)
-d name of data module
-o user-defined directory for data output (default is directory where input is located)
-s name of function set
-a minimum protein fragment length (overrides configuration file; default is 15)
-t maximum number of threads in threadpool (default = 1)
-f database writer flag (arguments: 0 = no, 1 = yes); analysis_writer_list in config determines type of database
(currently fasta/fastq file)
-l required if INFILE contains list of fasta/fastq files; if argument is "none", the list contains absolute paths;
otherwise argument is base directory and paths in list are relative to base directory;
when paths are relative to base directory and the -o option is set, output will inlcude relative paths
INFILE may be a fasta/fastq file, a directory with fasta/fastq files, or a file containing a list of fasta/fastq files.
However, only fasta or fastq files or their gzipped (.gz) versions with an extension in the config file parameter
fa_ext_list will be processed.
Parameters:
INFILE is the input file in FASTA format. Complete paths, relatives paths, and symlinks
may be used here.
10.2. Environmental variables¶
The environment variables can be seen by typing:
sequedex-config
producing:
Global Configuration Variables:
SEQUEDEX_USERDIR (user directory location) set to "/home/localhost/username/.sqdx/" from default.
SEQUEDEX_LOGLEVEL (debug|info|warning|error) set to "info" from default.
SEQUEDEX_LOGFILE_DIR (location of launcher log file) set to "/home/localhost/username/.sqdx/" from default.
SEQUEDEX_SEES_STDOUT (if False, pop up windows) set to "True" from default.
SEQUEDEX_HOME (top-level installation directory) set to "/home/localhost/username/sequedex/" from location of launcher binary.
SEQUEDEX_ETC (location of system config files) set to "/home/localhost/username/sequedex/etc/" from default.
SEQUEDEX_LIB (location of library files) set to "/home/localhost/username/sequedex/lib/" from default.
SEQUEDEX_BIN (location of executable files) set to "/home/localhost/username/sequedex/bin/" from default.
SEQUEDEX_DATA (location of data modules) set to "/home/localhost/username/sequedex/data/" from default.
SEQUEDEX_DOC (location of documentation and help files) set to "/home/localhost/username/sequedex/doc/" from default.
SEQUEDEX_CONTRIB (location of contributed files and data) set to "/home/localhost/username/sequedex/contrib/" from default.
SEQUEDEX_JAVA (absolute path to java executable) set to "/usr/bin/java" from output of "which java".
SEQUEDEX_CHECK_JAVA_VERSION (if True, check java version) set to "True" from default.
SEQUEDEX_REQUIRE_JAVA_VERSION (java version must be above this) set to "7" from default.
SEQUEDEX_PLATFORM_MEMSIZE (amount of system RAM in GB) set to "31" from reported system memory size.
SEQUEDEX_PYTHON (absolute path to python executable) set to "/usr/bin/python" from first python in PATH that has scipy.
SEQUINATOR_COMMAND (command run upon "Display Output") set to "/home/localhost/username/sequedex/bin/sequinator" from default.
SEQUINATOR_BROWSER (path to browser, with arguments) set to "/usr/bin/firefox" from plat-dependent default.
SEQUINATOR_SERVER (write and serve files) set to "True" from default.
SEQUINATOR_CLIENT (start browser) set to "True" from default.
SEQUEDEX_HAS_INTERNET (has web access for update) set to "True" from default.
SEQUINATOR_DATA (location of data library) set to "/home/localhost/username/sequedex/data/sequinator/" from default.
SEQUINATOR_SEARCH_PATH (path searched for output) set to "/home/localhost/username/:/home/localhost/username/sequedex/data/sequinator/" from default.
SEQUINATOR_HOST (IP address for sequinator to use) set to "127.0.0.1" from default.
SEQUINATOR_PORT (IP port for sequinator to use) set to "52707" from default.
SEQUINATOR_MAX_BROWSER_TABS (max tabs for sequinator to open) set to "10" from default.
SEQUINATOR_TITLE (format string for sequinator titles) set to "%(filename)s" from default.
SEQUEDEX_LAUNCHER_VERSION (sequedex-launcher version number) set to "1.0.10" from internal version number.
SEQUEDEX_LAUNCHER_DEFAULT_COMMAND (default command) set to "sequescan" from default.
SEQUEDEX_WWW (URL for updates and sequinator) set to "http://sequedex.lanl.gov" from default.
http_proxy (system http proxy setting) set to "None" from default.
ARCHY_ETC (location of Archaeoptryx config file) set to "/home/localhost/username/sequedex/etc/archy/" from default.
ARCHY_DEFAULT_TREE (default tree for archy) set to "/home/localhost/username/sequedex/data/trees/Life2550/tree.phyloxml" from default.
SEQUESCAN_ETC (location of sequescan conf files) set to "/home/localhost/username/sequedex/etc/sequescan" from default.
SEQUESCAN_HEAPSIZE_MB (java heapsize in MB for sequescan) set to "30000" from user config file.
SEQUESCAN_JAVA_ARGS (full java arguments to sequescan) set to "-Xms1000m -Xmx30000m" from platform-dependent default.
Data Module Configuration Variables:
Module virus1252:
minimumMemSize: 4
moduleName: virus1252
installDate: Sat Jun 21 08:30:56 2014
filename: virus1252.1.jar
version: 1
nextMemSize: 1000
Module Life2550:
minimumMemSize: 16
moduleName: Life2550
installDate: Sat Jun 21 08:30:55 2014
filename: Life2550-16GB.0.jar
version: 0
nextMemSize: 32
Use the sequedex-config command for setting these variables.
To check a specific environmental variable, add the variable as an argument to sequedex-config:
sequedex-config SEQUESCAN_HEAPSIZE
and to change a particular variable, provide both the variable name and a new value:
sequedex-config SEQUESCAN_HEAPSIZE 3000
10.3. Configuration options¶
The configuration options for sequescan can be seen by typing:
cat sequedex/etc/sequescan/sequescan.conf
producing:
; Windows style configuration file (INI file) where
; Properties: name=value
; Sections: [section]
; created by M. Bussod
; last modified 03/18/2014 by J. Cohn
; should always be current version
; Parameters in this configuration file always evaluate to string values unless the parameter
; name ends in one of the following suffiexes:
; *_int: evaluates to an integer type.
; *_bool: evaluates to a boolean type.
; *_float: evaluates to floating type.
; *_intList: evaluates to a comma-separated list of integer types.
; *_boolList: evaluates to a comma-separated list of boolean types.
; *_floatList: evaluates to comma-separated list of floating types.
; extensions of possible output files (files generated during a run depend upon options selected):
; .tsv - tab-separated files as documented in the Sequescan design document.
; .fa - nucleic acid fasta file of matching reading frames
; .fq - nucleic acid fastq file of matching reading frames
; .log - logging file.
; .json - JSON file
; .json and .tsv files for the most part have the same content in different formats
; runtime values used by this config file
; DBNAME ; signature data module name for this analysis from the -d option
; SCHEME ; function scheme for this analysis from the -s option
; environmental variables used by this config file (if not set, launcher script will set default values)
; SEQUEDEX_HOME ; path to the Sequedex distribution directory
; SEQUINATOR_COMMAND ; full path to the command which launches the sequinator program (currently a javascript application)
; SEQUEDEX_USERDIR ; default is currently ~/.sqdx on Linux and Mac
; SEQUESCAN_ETC ; path to the Sequescan etc directory (which is where, for example, the default sequescan.conf file is located
; Strings enclosed in matching pairs of percent signs will be passed for environmental variable expansion.
; Where paths are relative in config file, on Linux these will be relative to working directory
; when using Mac app, they will be relative to /? (since working directory for app is /)
[global]
nCPUS_int=1 ; allows usage of n processing threads (if valid license only); default is 1;
; overridden by command line option
min_prot_frag_len_int=15 ; minimum length of protein fragments in amino acids between stop codons
; overridden by command line option -a
config_file_version=1.0 ; config file version number
[licensed_features]
system_license_file=%SEQUESCAN_ETC%/license.lic
; if user cannot write to this location,
; license should be installed in $SEQUEDEX_USERDIR/license.lic
; gui will install license in $SEQUEDEX_USERDIR/license.lic
; program will always look for license in $SEQUEDEX_USERDIR first, then in system_license_file
write_db_bool=F ; analysis writer(s) are only added if set to T (true)
; write_db_bool=T is ignored if no valid license
; overridden by command line option
[input]
fasta_ext_list=fasta,fst,fna,fas,ffn,fa,fastq,fq ; allowed file extensions for input sequencing files
; fastq,fq are treated as fastq files - all others as fasta
[output]
log_dir=log ; output directory for sequescan log file relative to the top-level output directory
out_dir_ext=sqdx ; extension for lowest level output directory
who=who-%DBNAME% ; count of reads assigned to each interior nodes.
what=what-%DBNAME%x%SCHEME% ; fractional count of reads assigned to functional scheme classifications.
whoDoesWhat=wdw-%DBNAME%x%SCHEME% ; matrix of fractional counts of reads assigned by functional scheme and phylogeny.
; Rows are scheme classifications columns are phylogeny.
; Sum across columns is the what vector.
stats=%DBNAME%-stats ; file of general and phylogeny statistics needed for normalization
whatstats=whatstats-%DBNAME%x%SCHEME%
db=db-%DBNAME%x%SCHEME% ; core name for database output file (currently only options are for fasta/fastq output)
; if run with no functions, %SCHEME% will be substituted with "none"
progress_writer_list=gov.lanl.sequtils.writer.ProgressFileWriterJ,gov.lanl.sequtils.writer.ProgressFileWriterT
;list of summary/stats file writers to be used for each sequence file analyzed
analysis_writer_list=gov.lanl.sequtils.writer.SequencingFileWriter
; list of analysis writers to be used for each sequence file analyzed
analysis_output_type=same_as_input ; analysis writer output type - same_as_input, fasta (.fa), fastq (.fq)
; currently fastq will only work with fastq input but can force fasta output from fastq input
analysis_top_node_int=0 ; analysis writer output will include reads assigned to this node
; and all children nodes under it - 0 means all nodes - not yet working but will be overridable by runtime argument
progress_interval_long=1000000 ; if set to 0, will print summary statistics at end of processing for a particular fasta file
; if n > 0, summary statistics will be written after every n reads have been processed.
result_display_path=%SEQUINATOR_COMMAND%