10. Reference

10.1. Command line options for sequescan

The arguments for sequescan can be listed by typing:

~/sequedex/bin/sequescan run -h

at the comand line, resulting in the following output:

Command line execution of sequescan run mode:
sequescan run [-h] [-q] [-c config_file] -d data_module [-o output_directory] [-s function_set]
          [-a min_prot_frag_length] [-t thread_num] [-f database_writer_flag] [-l] INFILE

Example:
sequescan run -d Life2550-4GB.0 -s seed_0911.m1 -f 1 /Users/jsmith/mgData

Option descriptions:
-h    mode help
-q    quiet option - less messages to console or progress window
-c    user-defined configuration file (overrides system configuation file)
-d    name of data module
-o    user-defined directory for data output (default is directory where input is located)
-s    name of function set
-a    minimum protein fragment length (overrides configuration file; default is 15)
-t    maximum number of threads in threadpool (default = 1)
-f    database writer flag (arguments:  0 = no, 1 = yes); analysis_writer_list in config determines type of database
      (currently fasta/fastq file)
-l    required if INFILE contains list of fasta/fastq files; if argument is "none", the list contains absolute paths;
      otherwise argument is base directory and paths in list are relative to base directory;
      when paths are relative to base directory and the -o option is set, output will inlcude relative paths

INFILE may be a fasta/fastq file, a directory with fasta/fastq files, or a file containing a list of fasta/fastq files.
However, only fasta or fastq files or their gzipped (.gz) versions with an extension in the config file parameter
fa_ext_list will be processed.

Parameters:
  INFILE is the input file in FASTA format.  Complete paths, relatives paths, and symlinks
  may be used here.

10.2. Environmental variables

The environment variables can be seen by typing:

sequedex-config

producing:

Global Configuration Variables:
                SEQUEDEX_USERDIR (user directory location) set to "/home/localhost/username/.sqdx/" from default.
               SEQUEDEX_LOGLEVEL (debug|info|warning|error) set to "info" from default.
            SEQUEDEX_LOGFILE_DIR (location of launcher log file) set to "/home/localhost/username/.sqdx/" from default.
            SEQUEDEX_SEES_STDOUT (if False, pop up windows) set to "True" from default.
                   SEQUEDEX_HOME (top-level installation directory) set to "/home/localhost/username/sequedex/" from location of launcher binary.
                    SEQUEDEX_ETC (location of system config files) set to "/home/localhost/username/sequedex/etc/" from default.
                    SEQUEDEX_LIB (location of library files) set to "/home/localhost/username/sequedex/lib/" from default.
                    SEQUEDEX_BIN (location of executable files) set to "/home/localhost/username/sequedex/bin/" from default.
                   SEQUEDEX_DATA (location of data modules) set to "/home/localhost/username/sequedex/data/" from default.
                    SEQUEDEX_DOC (location of documentation and help files) set to "/home/localhost/username/sequedex/doc/" from default.
                SEQUEDEX_CONTRIB (location of contributed files and data) set to "/home/localhost/username/sequedex/contrib/" from default.
                   SEQUEDEX_JAVA (absolute path to java executable) set to "/usr/bin/java" from output of "which java".
     SEQUEDEX_CHECK_JAVA_VERSION (if True, check java version) set to "True" from default.
   SEQUEDEX_REQUIRE_JAVA_VERSION (java version must be above this) set to "7" from default.
       SEQUEDEX_PLATFORM_MEMSIZE (amount of system RAM in GB) set to "31" from reported system memory size.
                 SEQUEDEX_PYTHON (absolute path to python executable) set to "/usr/bin/python" from first python in PATH that has scipy.
              SEQUINATOR_COMMAND (command run upon "Display Output") set to "/home/localhost/username/sequedex/bin/sequinator" from default.
              SEQUINATOR_BROWSER (path to browser, with arguments) set to "/usr/bin/firefox" from plat-dependent default.
               SEQUINATOR_SERVER (write and serve files) set to "True" from default.
               SEQUINATOR_CLIENT (start browser) set to "True" from default.
           SEQUEDEX_HAS_INTERNET (has web access for update) set to "True" from default.
                 SEQUINATOR_DATA (location of data library) set to "/home/localhost/username/sequedex/data/sequinator/" from default.
          SEQUINATOR_SEARCH_PATH (path searched for output) set to "/home/localhost/username/:/home/localhost/username/sequedex/data/sequinator/" from default.
                 SEQUINATOR_HOST (IP address for sequinator to use) set to "127.0.0.1" from default.
                 SEQUINATOR_PORT (IP port for sequinator to use) set to "52707" from default.
     SEQUINATOR_MAX_BROWSER_TABS (max tabs for sequinator to open) set to "10" from default.
                SEQUINATOR_TITLE (format string for sequinator titles) set to "%(filename)s" from default.
       SEQUEDEX_LAUNCHER_VERSION (sequedex-launcher version number) set to "1.0.10" from internal version number.
SEQUEDEX_LAUNCHER_DEFAULT_COMMAND (default command) set to "sequescan" from default.
                    SEQUEDEX_WWW (URL for updates and sequinator) set to "http://sequedex.lanl.gov" from default.
                      http_proxy (system http proxy setting) set to "None" from default.
                       ARCHY_ETC (location of Archaeoptryx config file) set to "/home/localhost/username/sequedex/etc/archy/" from default.
              ARCHY_DEFAULT_TREE (default tree for archy) set to "/home/localhost/username/sequedex/data/trees/Life2550/tree.phyloxml" from default.
                   SEQUESCAN_ETC (location of sequescan conf files) set to "/home/localhost/username/sequedex/etc/sequescan" from default.
           SEQUESCAN_HEAPSIZE_MB (java heapsize in MB for sequescan) set to "30000" from user config file.
             SEQUESCAN_JAVA_ARGS (full java arguments to sequescan) set to "-Xms1000m -Xmx30000m" from platform-dependent default.
Data Module Configuration Variables:
  Module virus1252:
                   minimumMemSize: 4
                       moduleName: virus1252
                      installDate: Sat Jun 21 08:30:56 2014
                         filename: virus1252.1.jar
                          version: 1
                      nextMemSize: 1000
  Module Life2550:
                   minimumMemSize: 16
                       moduleName: Life2550
                      installDate: Sat Jun 21 08:30:55 2014
                         filename: Life2550-16GB.0.jar
                          version: 0
                      nextMemSize: 32

Use the sequedex-config command for setting these variables.

To check a specific environmental variable, add the variable as an argument to sequedex-config:

sequedex-config SEQUESCAN_HEAPSIZE

and to change a particular variable, provide both the variable name and a new value:

sequedex-config SEQUESCAN_HEAPSIZE 3000

10.3. Configuration options

The configuration options for sequescan can be seen by typing:

cat sequedex/etc/sequescan/sequescan.conf

producing:

; Windows style configuration file (INI file) where
; Properties:  name=value
; Sections:    [section]
; created by M. Bussod
; last modified 03/18/2014 by J. Cohn
; should always be current version


; Parameters in this configuration file always evaluate to string values unless the parameter
; name ends in one of the following suffiexes:
;    *_int:     evaluates to an integer type.
;    *_bool:     evaluates to a boolean type.
;    *_float:     evaluates to floating type.
;    *_intList:     evaluates to a comma-separated list of integer types.
;    *_boolList:     evaluates to a comma-separated list of boolean types.
;    *_floatList:     evaluates to comma-separated list of floating types.

; extensions of possible output files (files generated during a run depend upon options selected):
;    .tsv - tab-separated files as documented in the Sequescan design document.
;    .fa   - nucleic acid fasta file of matching reading frames
;    .fq   - nucleic acid fastq file of matching reading frames
;    .log - logging file.
;    .json - JSON file
;    .json and .tsv files for the most part have the same content in different formats


; runtime values used by this config file
;    DBNAME         ; signature data module name for this analysis from the -d option
;    SCHEME        ;  function scheme for this analysis from the -s option

; environmental variables used by this config file (if not set, launcher script will set default values)
;    SEQUEDEX_HOME      ; path to the Sequedex distribution directory
;    SEQUINATOR_COMMAND ; full path to the command which launches the sequinator program (currently a javascript application)
;    SEQUEDEX_USERDIR   ; default is currently ~/.sqdx on Linux and Mac
;    SEQUESCAN_ETC ; path to the Sequescan etc directory (which is where, for example, the default sequescan.conf file is located


; Strings enclosed in matching pairs of percent signs will be passed for environmental variable expansion.
; Where paths are relative in config file, on Linux these will be relative to working directory
; when using Mac app, they will be relative to /? (since working directory for app is /)


[global]
nCPUS_int=1       ; allows usage of n processing threads (if valid license only);  default is 1;
                 ; overridden by command line option
min_prot_frag_len_int=15  ; minimum length of protein fragments in amino acids between stop codons
                         ; overridden by command line option -a
config_file_version=1.0  ; config file version number


[licensed_features]
system_license_file=%SEQUESCAN_ETC%/license.lic
       ; if user cannot write to this location,
       ; license should be installed in $SEQUEDEX_USERDIR/license.lic
           ; gui will install license in $SEQUEDEX_USERDIR/license.lic
       ; program will always look for license in $SEQUEDEX_USERDIR first, then in system_license_file
write_db_bool=F   ; analysis writer(s) are only added if set to T (true)
                 ; write_db_bool=T is ignored if no valid license
                 ; overridden by command line option

[input]
fasta_ext_list=fasta,fst,fna,fas,ffn,fa,fastq,fq    ; allowed file extensions for input sequencing files
                                                                                                ; fastq,fq are treated as fastq files - all others as fasta

[output]
log_dir=log       ; output directory for sequescan log file relative to the top-level output directory
out_dir_ext=sqdx   ; extension for lowest level output directory
who=who-%DBNAME%         ; count of reads assigned to each interior nodes.
what=what-%DBNAME%x%SCHEME%  ; fractional count of reads assigned to functional scheme classifications.
whoDoesWhat=wdw-%DBNAME%x%SCHEME%   ; matrix of fractional counts of reads assigned by functional scheme and phylogeny.
     ; Rows are scheme classifications columns are phylogeny.
     ; Sum across columns is the what vector.
stats=%DBNAME%-stats            ; file of general and phylogeny statistics needed for normalization
whatstats=whatstats-%DBNAME%x%SCHEME%
db=db-%DBNAME%x%SCHEME%    ; core name for database output file (currently only options are for fasta/fastq output)
             ; if run with no functions, %SCHEME% will be substituted with "none"

progress_writer_list=gov.lanl.sequtils.writer.ProgressFileWriterJ,gov.lanl.sequtils.writer.ProgressFileWriterT
   ;list of summary/stats file writers to be used for each sequence file analyzed
analysis_writer_list=gov.lanl.sequtils.writer.SequencingFileWriter
   ; list of analysis writers to be used for each sequence file analyzed
analysis_output_type=same_as_input  ;  analysis writer output type - same_as_input, fasta (.fa), fastq (.fq)
      ; currently fastq will only work with fastq input but can force fasta output from fastq input
analysis_top_node_int=0     ; analysis writer output will include reads assigned to this node
      ; and all children nodes under it - 0 means all nodes - not yet working but will be overridable by runtime argument
progress_interval_long=1000000      ; if set to 0, will print summary statistics at end of processing for a particular fasta file
                           ; if n > 0, summary statistics will be written after every n reads have been processed.
result_display_path=%SEQUINATOR_COMMAND%