cd-hit
Table of Content
About
CD-HIT - Cluster Database at High Identity with Tolerance.
CD-HIT takes a fasta format sequence database as input and produces a set of 'non-redundant' (nr) representative sequences as output. In addition cd-hit outputs a cluster file, documenting the sequence 'groupies' for each nr sequence representative. The idea is to reduce the overall size of the database without removing any sequence information by only removing 'redundant' (or highly similar) sequences. This is why the resulting database is called non-redundant (nr). Essentially, cd-hit produces a set of closely related protein families from a given fasta sequence database.
CD-HIT uses a 'longest sequence first' list removal algorithm to remove sequences above a certain identity threshold. Additionally the algorithm implements a very fast heuristic to find high identity segments between sequences, and so can avoid many costly full alignments.
Versions and Availability
Module Names for cd-hit on philip
Machine | Version | Module Name |
---|---|---|
philip | 4.6.1 | cd-hit/4.6.1/INTEL-15.0.3 |
▶ Module FAQ?
The information here is applicable to LSU HPC and LONI systems.
Shells
A user may choose between using /bin/bash and /bin/tcsh. Details about each shell follows.
/bin/bash
System resource file: /etc/profile
When one access the shell, the following user files are read in if they exist (in order):
- ~/.bash_profile (anything sent to STDOUT or STDERR will cause things like rsync to break)
- ~/.bashrc (interactive login only)
- ~/.profile
When a user logs out of an interactive session, the file ~/.bash_logout is executed if it exists.
The default value of the environmental variable, PATH, is set automatically using SoftEnv. See below for more information.
/bin/tcsh
The file ~/.cshrc is used to customize the user's environment if his login shell is /bin/tcsh.
Modules
Modules is a utility which helps users manage the complex business of setting up their shell environment in the face of potentially conflicting application versions and libraries.
Default Setup
When a user logs in, the system looks for a file named .modules in their home directory. This file contains module commands to set up the initial shell environment.
Viewing Available Modules
The command
$ module avail
displays a list of all the modules available. The list will look something like:
--- some stuff deleted --- velvet/1.2.10/INTEL-14.0.2 vmatch/2.2.2 ---------------- /usr/local/packages/Modules/modulefiles/admin ----------------- EasyBuild/1.11.1 GCC/4.9.0 INTEL-140-MPICH/3.1.1 EasyBuild/1.13.0 INTEL/14.0.2 INTEL-140-MVAPICH2/2.0 --- some stuff deleted ---
The module names take the form appname/version/compiler, providing the application name, the version, and information about how it was compiled (if needed).
Managing Modules
Besides avail, there are other basic module commands to use for manipulating the environment. These include:
add/load mod1 mod2 ... modn . . . Add modules rm/unload mod1 mod2 ... modn . . Remove modules switch/swap mod . . . . . . . . . Switch or swap one module for another display/show . . . . . . . . . . List modules loaded in the environment avail . . . . . . . . . . . . . . List available module names whatis mod1 mod2 ... modn . . . . Describe listed modules
The -h option to module will list all available commands.
Module is currently available only on SuperMIC.
Resources
Last modified: August 21 2017 10:47:37.