gnuparallel
Table of Content
About
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.
Versions and Availability
▶ Display Softenv Keys for gnuparallel on all clusters
Machine | Version | Softenv Key |
---|---|---|
supermike2 | 20161022 | +gnuparallel-20161022-gcc-4.4.6 |
▶ Softenv FAQ?
The information here is applicable to LSU HPC and LONI systems.
Shells
A user may choose between using /bin/bash and /bin/tcsh. Details about each shell follows.
/bin/bash
System resource file: /etc/profile
When one access the shell, the following user files are read in if they exist (in order):
- ~/.bash_profile (anything sent to STDOUT or STDERR will cause things like rsync to break)
- ~/.bashrc (interactive login only)
- ~/.profile
When a user logs out of an interactive session, the file ~/.bash_logout is executed if it exists.
The default value of the environmental variable, PATH, is set automatically using SoftEnv. See below for more information.
/bin/tcsh
The file ~/.cshrc is used to customize the user's environment if his login shell is /bin/tcsh.
Softenv
SoftEnv is a utility that is supposed to help users manage complex user environments with potentially conflicting application versions and libraries.
System Default Path
When a user logs in, the system /etc/profile or /etc/csh.cshrc (depending on login shell, and mirrored from csm:/cfmroot/etc/profile) calls /usr/local/packages/softenv-1.6.2/bin/use.softenv.sh to set up the default path via the SoftEnv database.
SoftEnv looks for a user's ~/.soft file and updates the variables and paths accordingly.
Viewing Available Packages
The command softenv will provide a list of available packages. The listing will look something like:
$ softenv These are the macros available: * @default These are the keywords explicitly available: +amber-8 Applications: 'Amber', version: 8 Amber is a +apache-ant-1.6.5 Ant, Java based XML make system version: 1.6. +charm-5.9 Applications: 'Charm++', version: 5.9 Charm++ +default this is the default environment...nukes /etc/ +essl-4.2 Libraries: 'ESSL', version: 4.2 ESSL is a sta +gaussian-03 Applications: 'Gaussian', version: 03 Gaussia ... some stuff deleted ...
Managing SoftEnv
The file ~/.soft in the user's home directory is where the different packages are managed. Add the +keyword into your .soft file. For instance, ff one wants to add the Amber Molecular Dynamics package into their environment, the end of the .soft file should look like this:
+amber-8
@default
To update the environment after modifying this file, one simply uses the resoft command:
% resoft
The command soft can be used to manipulate the environment from the command line. It takes the form:
$ soft add/delete +keyword
Using this method of adding or removing keywords requires the user to pay attention to possible order dependencies. That is, best results require the user to remove keywords in the reverse order in which they were added. It is handy to test out individual keys, but can lead to trouble if changing multiple keys. Changing the .soft file and issuing the resoft is the recommended way of dealing with multiple changes.
▶ Display Module Names for gnuparallel on all clusters.
Machine | Version | Module |
---|---|---|
qb2 | 20170122 | gnuparallel/20170122 |
smic | 20170122 | gnuparallel/20170122 |
▶ Module FAQ?
The information here is applicable to LSU HPC and LONI systems.
Shells
A user may choose between using /bin/bash and /bin/tcsh. Details about each shell follows.
/bin/bash
System resource file: /etc/profile
When one access the shell, the following user files are read in if they exist (in order):
- ~/.bash_profile (anything sent to STDOUT or STDERR will cause things like rsync to break)
- ~/.bashrc (interactive login only)
- ~/.profile
When a user logs out of an interactive session, the file ~/.bash_logout is executed if it exists.
The default value of the environmental variable, PATH, is set automatically using SoftEnv. See below for more information.
/bin/tcsh
The file ~/.cshrc is used to customize the user's environment if his login shell is /bin/tcsh.
Modules
Modules is a utility which helps users manage the complex business of setting up their shell environment in the face of potentially conflicting application versions and libraries.
Default Setup
When a user logs in, the system looks for a file named .modules in their home directory. This file contains module commands to set up the initial shell environment.
Viewing Available Modules
The command
$ module avail
displays a list of all the modules available. The list will look something like:
--- some stuff deleted --- velvet/1.2.10/INTEL-14.0.2 vmatch/2.2.2 ---------------- /usr/local/packages/Modules/modulefiles/admin ----------------- EasyBuild/1.11.1 GCC/4.9.0 INTEL-140-MPICH/3.1.1 EasyBuild/1.13.0 INTEL/14.0.2 INTEL-140-MVAPICH2/2.0 --- some stuff deleted ---
The module names take the form appname/version/compiler, providing the application name, the version, and information about how it was compiled (if needed).
Managing Modules
Besides avail, there are other basic module commands to use for manipulating the environment. These include:
add/load mod1 mod2 ... modn . . . Add modules rm/unload mod1 mod2 ... modn . . Remove modules switch/swap mod . . . . . . . . . Switch or swap one module for another display/show . . . . . . . . . . List modules loaded in the environment avail . . . . . . . . . . . . . . List available module names whatis mod1 mod2 ... modn . . . . Describe listed modules
The -h option to module will list all available commands.
Module is currently available only on SuperMIC.
Usage
Parallel typical serial and MPI-based applications.
(1) Parallel serial jobs
Example of a blast job on Mike:
#!/bin/bash #PBS -A hpc_smictest3 #PBS -l nodes=2:ppn=16 #PBS -l walltime=1:00:00 #PBS -q workq cd $PBS_O_WORKDIR export JOBS_PER_NODE=16 export WDIR=$PBS_O_WORKDIR parallel --progress \ # shows progres --joblog logfile \ # job logfile -j $JOBS_PER_NODE \ # jobs per node --slf $PBS_NODEFILE \ # nodes assigned to your job --workdir $WDIR \ ./cmd_blast.sh {} {/.} :::: input.lst #script_to_parallize input output joblist
where: input.lst contains job input list:
/work/$USER/blast/data/input1.faa /work/$USER/blast/data/input2.faa .... /work/$USER/blast/data/input200.faa
where: cmd_blast.sh is the script for running a serial blast job
e.g.: ./cmd_blast.sh input1.faa input1 -- how to run single serial job
#!/bin/bash export WDIR=/xxx/xxx cd $WDIR blastp -query $1 -db db/img_v400_PROT.00 -out output/$2.out -outfmt 7 -max_target_seqs 100 -num_threads 2
(2) Parallel MPI jobs
Use "mpirun" to run a laplace
#!/bin/bash #PBS -A your_allocation_name #PBS -l walltime=2:00:00 #PBS -l nodes=4:ppn=16 #PBS -q checkpt export JOBS_PER_NODE=8 export NPROCS=2 export WDIR=$PBS_O_WORKDIR cd $WDIR parallel --progress \ -j $JOBS_PER_NODE \ --slf $PBS_NODEFILE \ --workdir $WDIR \ ./cmd_mpi.sh {} $NPROCS :::: input.lst
where: cmd_mpi.sh is the script to run one MPI job
#!/bin/bash export WDIR=$PBS_O_WORKDIR FILE=$(eval echo $1) param=`cat ${FILE}` mpirun -ppn $2 $WDIR/lap_mpi $param
where: input.lst contains job input list:
/work/$USER/laplace/data/input1 /work/$USER/laplace/data/input2 .... /work/$USER/laplace/data/input200 cat input1: 4096 4096 2 2 0.08 20000 0 0
Resources
Last modified: August 21 2017 10:47:37.