10. Reference

10.1. Command line options for sequescan

The arguments for sequescan can be listed by typing:

~/sequedex/bin/sequescan run -h

at the comand line, resulting in the following output:

Command line execution of sequescan run mode:
sequescan run [-h] [-q] [-c config_file] -d data_module [-o output_directory] [-s function_set]
          [-a min_prot_frag_length] [-t thread_num] [-f database_writer_flag] [-l] INFILE

sequescan run -d Life2550-4GB.0 -s seed_0911.m1 -f 1 /Users/jsmith/mgData

Option descriptions:
-h    mode help
-q    quiet option - less messages to console or progress window
-c    user-defined configuration file (overrides system configuation file)
-d    name of data module
-o    user-defined directory for data output (default is directory where input is located)
-s    name of function set
-a    minimum protein fragment length (overrides configuration file; default is 15)
-t    maximum number of threads in threadpool (default = 1)
-f    database writer flag (arguments:  0 = no, 1 = yes); analysis_writer_list in config determines type of database
      (currently fasta/fastq file)
-l    required if INFILE contains list of fasta/fastq files; if argument is "none", the list contains absolute paths;
      otherwise argument is base directory and paths in list are relative to base directory;
      when paths are relative to base directory and the -o option is set, output will inlcude relative paths

INFILE may be a fasta/fastq file, a directory with fasta/fastq files, or a file containing a list of fasta/fastq files.
However, only fasta or fastq files or their gzipped (.gz) versions with an extension in the config file parameter
fa_ext_list will be processed.

  INFILE is the input file in FASTA format.  Complete paths, relatives paths, and symlinks
  may be used here.

10.2. Environmental variables

The environment variables can be seen by typing:



Global Configuration Variables:
                SEQUEDEX_USERDIR (user directory location) set to "/home/localhost/username/.sqdx/" from default.
               SEQUEDEX_LOGLEVEL (debug|info|warning|error) set to "info" from default.
            SEQUEDEX_LOGFILE_DIR (location of launcher log file) set to "/home/localhost/username/.sqdx/" from default.
            SEQUEDEX_SEES_STDOUT (if False, pop up windows) set to "True" from default.
                   SEQUEDEX_HOME (top-level installation directory) set to "/home/localhost/username/sequedex/" from location of launcher binary.
                    SEQUEDEX_ETC (location of system config files) set to "/home/localhost/username/sequedex/etc/" from default.
                    SEQUEDEX_LIB (location of library files) set to "/home/localhost/username/sequedex/lib/" from default.
                    SEQUEDEX_BIN (location of executable files) set to "/home/localhost/username/sequedex/bin/" from default.
                   SEQUEDEX_DATA (location of data modules) set to "/home/localhost/username/sequedex/data/" from default.
                    SEQUEDEX_DOC (location of documentation and help files) set to "/home/localhost/username/sequedex/doc/" from default.
                SEQUEDEX_CONTRIB (location of contributed files and data) set to "/home/localhost/username/sequedex/contrib/" from default.
                   SEQUEDEX_JAVA (absolute path to java executable) set to "/usr/bin/java" from output of "which java".
     SEQUEDEX_CHECK_JAVA_VERSION (if True, check java version) set to "True" from default.
   SEQUEDEX_REQUIRE_JAVA_VERSION (java version must be above this) set to "7" from default.
       SEQUEDEX_PLATFORM_MEMSIZE (amount of system RAM in GB) set to "31" from reported system memory size.
                 SEQUEDEX_PYTHON (absolute path to python executable) set to "/usr/bin/python" from first python in PATH that has scipy.
              SEQUINATOR_COMMAND (command run upon "Display Output") set to "/home/localhost/username/sequedex/bin/sequinator" from default.
              SEQUINATOR_BROWSER (path to browser, with arguments) set to "/usr/bin/firefox" from plat-dependent default.
               SEQUINATOR_SERVER (write and serve files) set to "True" from default.
               SEQUINATOR_CLIENT (start browser) set to "True" from default.
           SEQUEDEX_HAS_INTERNET (has web access for update) set to "True" from default.
                 SEQUINATOR_DATA (location of data library) set to "/home/localhost/username/sequedex/data/sequinator/" from default.
          SEQUINATOR_SEARCH_PATH (path searched for output) set to "/home/localhost/username/:/home/localhost/username/sequedex/data/sequinator/" from default.
                 SEQUINATOR_HOST (IP address for sequinator to use) set to "" from default.
                 SEQUINATOR_PORT (IP port for sequinator to use) set to "52707" from default.
     SEQUINATOR_MAX_BROWSER_TABS (max tabs for sequinator to open) set to "10" from default.
                SEQUINATOR_TITLE (format string for sequinator titles) set to "%(filename)s" from default.
       SEQUEDEX_LAUNCHER_VERSION (sequedex-launcher version number) set to "1.0.10" from internal version number.
SEQUEDEX_LAUNCHER_DEFAULT_COMMAND (default command) set to "sequescan" from default.
                    SEQUEDEX_WWW (URL for updates and sequinator) set to "http://sequedex.lanl.gov" from default.
                      http_proxy (system http proxy setting) set to "None" from default.
                       ARCHY_ETC (location of Archaeoptryx config file) set to "/home/localhost/username/sequedex/etc/archy/" from default.
              ARCHY_DEFAULT_TREE (default tree for archy) set to "/home/localhost/username/sequedex/data/trees/Life2550/tree.phyloxml" from default.
                   SEQUESCAN_ETC (location of sequescan conf files) set to "/home/localhost/username/sequedex/etc/sequescan" from default.
           SEQUESCAN_HEAPSIZE_MB (java heapsize in MB for sequescan) set to "30000" from user config file.
             SEQUESCAN_JAVA_ARGS (full java arguments to sequescan) set to "-Xms1000m -Xmx30000m" from platform-dependent default.
Data Module Configuration Variables:
  Module virus1252:
                   minimumMemSize: 4
                       moduleName: virus1252
                      installDate: Sat Jun 21 08:30:56 2014
                         filename: virus1252.1.jar
                          version: 1
                      nextMemSize: 1000
  Module Life2550:
                   minimumMemSize: 16
                       moduleName: Life2550
                      installDate: Sat Jun 21 08:30:55 2014
                         filename: Life2550-16GB.0.jar
                          version: 0
                      nextMemSize: 32

Use the sequedex-config command for setting these variables.

To check a specific environmental variable, add the variable as an argument to sequedex-config:

sequedex-config SEQUESCAN_HEAPSIZE

and to change a particular variable, provide both the variable name and a new value:

sequedex-config SEQUESCAN_HEAPSIZE 3000

10.3. Configuration options

The configuration options for sequescan can be seen by typing:

cat sequedex/etc/sequescan/sequescan.conf


; Windows style configuration file (INI file) where
; Properties:  name=value
; Sections:    [section]
; created by M. Bussod
; last modified 03/18/2014 by J. Cohn
; should always be current version

; Parameters in this configuration file always evaluate to string values unless the parameter
; name ends in one of the following suffiexes:
;    *_int:     evaluates to an integer type.
;    *_bool:     evaluates to a boolean type.
;    *_float:     evaluates to floating type.
;    *_intList:     evaluates to a comma-separated list of integer types.
;    *_boolList:     evaluates to a comma-separated list of boolean types.
;    *_floatList:     evaluates to comma-separated list of floating types.

; extensions of possible output files (files generated during a run depend upon options selected):
;    .tsv - tab-separated files as documented in the Sequescan design document.
;    .fa   - nucleic acid fasta file of matching reading frames
;    .fq   - nucleic acid fastq file of matching reading frames
;    .log - logging file.
;    .json - JSON file
;    .json and .tsv files for the most part have the same content in different formats

; runtime values used by this config file
;    DBNAME         ; signature data module name for this analysis from the -d option
;    SCHEME        ;  function scheme for this analysis from the -s option

; environmental variables used by this config file (if not set, launcher script will set default values)
;    SEQUEDEX_HOME      ; path to the Sequedex distribution directory
;    SEQUINATOR_COMMAND ; full path to the command which launches the sequinator program (currently a javascript application)
;    SEQUEDEX_USERDIR   ; default is currently ~/.sqdx on Linux and Mac
;    SEQUESCAN_ETC ; path to the Sequescan etc directory (which is where, for example, the default sequescan.conf file is located

; Strings enclosed in matching pairs of percent signs will be passed for environmental variable expansion.
; Where paths are relative in config file, on Linux these will be relative to working directory
; when using Mac app, they will be relative to /? (since working directory for app is /)

nCPUS_int=1       ; allows usage of n processing threads (if valid license only);  default is 1;
                 ; overridden by command line option
min_prot_frag_len_int=15  ; minimum length of protein fragments in amino acids between stop codons
                         ; overridden by command line option -a
config_file_version=1.0  ; config file version number

       ; if user cannot write to this location,
       ; license should be installed in $SEQUEDEX_USERDIR/license.lic
           ; gui will install license in $SEQUEDEX_USERDIR/license.lic
       ; program will always look for license in $SEQUEDEX_USERDIR first, then in system_license_file
write_db_bool=F   ; analysis writer(s) are only added if set to T (true)
                 ; write_db_bool=T is ignored if no valid license
                 ; overridden by command line option

fasta_ext_list=fasta,fst,fna,fas,ffn,fa,fastq,fq    ; allowed file extensions for input sequencing files
                                                                                                ; fastq,fq are treated as fastq files - all others as fasta

log_dir=log       ; output directory for sequescan log file relative to the top-level output directory
out_dir_ext=sqdx   ; extension for lowest level output directory
who=who-%DBNAME%         ; count of reads assigned to each interior nodes.
what=what-%DBNAME%x%SCHEME%  ; fractional count of reads assigned to functional scheme classifications.
whoDoesWhat=wdw-%DBNAME%x%SCHEME%   ; matrix of fractional counts of reads assigned by functional scheme and phylogeny.
     ; Rows are scheme classifications columns are phylogeny.
     ; Sum across columns is the what vector.
stats=%DBNAME%-stats            ; file of general and phylogeny statistics needed for normalization
db=db-%DBNAME%x%SCHEME%    ; core name for database output file (currently only options are for fasta/fastq output)
             ; if run with no functions, %SCHEME% will be substituted with "none"

   ;list of summary/stats file writers to be used for each sequence file analyzed
   ; list of analysis writers to be used for each sequence file analyzed
analysis_output_type=same_as_input  ;  analysis writer output type - same_as_input, fasta (.fa), fastq (.fq)
      ; currently fastq will only work with fastq input but can force fasta output from fastq input
analysis_top_node_int=0     ; analysis writer output will include reads assigned to this node
      ; and all children nodes under it - 0 means all nodes - not yet working but will be overridable by runtime argument
progress_interval_long=1000000      ; if set to 0, will print summary statistics at end of processing for a particular fasta file
                           ; if n > 0, summary statistics will be written after every n reads have been processed.