STAR
- Ebuilds: 1, Testing: 2.7.10a Description: STAR aligner: align RNA-seq reads to reference genome uncompressed suffix arrays
Homepage:https://github.com/alexdobin/STAR License: GPL-3
aaindex
- Ebuilds: 1, Testing: 9.1-r2 Description:
Amino acid indices and similarity matrices maintained at Kyoto
University. An amino acid index is a set of 20 numerical values
representing any of the different physicochemical and biological
properties of amino acids. The AAindex1 section of the Amino Acid
Index Database is a collection of published indices together with the
result of cluster analysis using the correlation coefficient as the
distance between two indices. This section currently contains 494
indices. Another important feature of amino acids that can be
represented numerically is the similarity between amino acids. Thus, a
similarity matrix, also called a mutation matrix, is a set of 210
numerical values, 20 diagonal and 20x19/2 off-diagonal elements, used
for sequence alignments and similarity searches. The AAindex2 section
of the Amino Acid Index Database is a collection of published amino
acid mutation matrices together with the result of cluster analysis.
This section currently contains 83 matrices.
Homepage:https://www.genome.jp/aaindex/ License: public-domain
bamtools
- Ebuilds: 2, Stable: 2.5.3, Testing: 9999 Description: BAM (Binary Alignment/Map) format is useful for storing large DNA sequence alignments. It is closely related to the text-based SAM format, but optimized for random-access. BamTools provides a fast, flexible C++ API for reading and writing BAM files.
Homepage:https://github.com/pezmaster31/bamtools
bcftools
- Ebuilds: 4, Testing: 1.23 Description: Utilities for variant calling and manipulating VCF and BCF files
Homepage:http://www.htslib.org License: MIT
bedtools
- Ebuilds: 1, Testing: 2.31.1 Description: Tools for manipulation and analysis of BED, GFF/GTF, VCF, SAM/BAM file formats
Homepage:https://bedtools.readthedocs.io/ License: GPL-2
bwa
- Ebuilds: 1, Testing: 0.7.17 Description: Burrows-Wheeler Alignment Tool, a fast short genomic sequence aligner
Homepage:https://github.com/lh3/bwa/ License: GPL-3
cd-hit
- Ebuilds: 1, Testing: 4.6.6-r1 Description:
CD-HIT is a very widely used program for clustering and comparing large sets
of protein or nucleotide sequences. CD-HIT is very fast and can handle
extremely large databases. CD-HIT helps to significantly reduce the
computational and manual efforts in many sequence analysis tasks and aids in
understanding the data structure and correct the bias within a dataset.
The CD-HIT package has CD-HIT, CD-HIT-2D, CD-HIT-EST, CD-HIT-EST-2D,
CD-HIT-454, CD-HIT-PARA, PSI-CD-HIT and over a dozen scripts. CD-HIT
(CD-HIT-EST) clusters similar proteins (DNAs) into clusters that meet a
user-defined similarity threshold. CD-HIT-2D (CD-HIT-EST-2D) compares 2
datasets and identifies the sequences in db2 that are similar to db1 above
a threshold. CD-HIT-454 is a program to identify natural and artificial
duplicates from pyrosequencing reads. The usage of other programs and
scripts can be found in CD-HIT user's guide.
Homepage:http://weizhong-lab.ucsd.edu/cd-hit/ License: GPL-2
clustal-omega
- Ebuilds: 1, Stable: 1.2.4-r1, Testing: 1.2.4-r1 Description: Scalable multiple alignment of protein sequences
Homepage:http://www.clustal.org/omega/ License: GPL-2
clustalw
- Ebuilds: 2, Stable: 2.1-r2, 1.83-r4, Testing: 2.1-r2 Description: General purpose multiple alignment program for DNA and proteins
Homepage:http://www.clustal.org/ License: GPL-3 LGPL-3
cutg
- Ebuilds: 1, Testing: 160-r1 Description:
Codon usage tables maintained at the Kazusa DNA Research Institute.
Codon usage in individual genes has been calculated using the
nucleotide sequence data obtained from the GenBank Genetic Sequence
Database. The compilation of codon usage is synchronized with each
major release of GenBank.
Homepage:http://www.kazusa.or.jp/codon/ License: public-domain
dialign-tx
- Ebuilds: 1, Testing: 1.0.2-r2 Description: Greedy and progressive approaches for segment-based multiple sequence alignment
Homepage:http://dialign-tx.gobics.de/ License: LGPL-2.1
elph
- Ebuilds: 1, Testing: 1.0.1-r3 Description:
ELPH is a general-purpose Gibbs sampler for finding motifs in a set of
DNA or protein sequences. The program takes as input a set containing
anywhere from a few dozen to thousands of sequences, and searches
through them for the most common motif, assuming that each sequence
contains one copy of the motif.
Homepage:http://cbcb.umd.edu/software/ELPH/ License: Artistic
embassy
- Ebuilds: 1, Testing: 6.6.0-r3 Description: A meta-package for installing all EMBASSY packages (EMBOSS add-ons)
Homepage:http://emboss.sourceforge.net/embassy/ License: metapackage
emboss
- Ebuilds: 1, Testing: 6.6.0-r4 Description:
EMBOSS is "The European Molecular Biology Open Software Suite".
EMBOSS is a free Open Source software analysis package specially
developed for the needs of the molecular biology (e.g. EMBnet) user
community. The software automatically copes with data in a variety
of formats and even allows transparent retrieval of sequence data
from the web. Also, as extensive libraries are provided with the
package, it is a platform to allow other scientists to develop and
release software in true open source spirit. EMBOSS also integrates
a range of currently available packages and tools for sequence
analysis into a seamless whole. EMBOSS breaks the historical trend
towards commercial software packages.
License: Apache-2.0 GPL-3+ CC-BY-3.0
foldingathome
- Ebuilds: 2, Testing: 7.6.21 Description: Folding@Home is a distributed computing project for protein folding
Homepage:https://foldingathome.org/ License: FAH-EULA-2014 FAH-special-permission
gmap
- Ebuilds: 1, Testing: 2020.10.27 Description: A Genomic Mapping and Alignment Program for mRNA and EST Sequences
Homepage:http://research-pub.gene.com/gmap/ License: gmap
mosaik
- Ebuilds: 1, Testing: 2.2.30 Description: A reference-guided aligner for next-generation sequencing technologies
Homepage:https://github.com/wanpinglee/MOSAIK License: GPL-2
mothur
- Ebuilds: 1, Stable: 1.48.2, Testing: 1.48.2 Description: Suite of algorithms for ecological bioinformatics
Homepage:https://mothur.org/ License: GPL-3
mrbayes
- Ebuilds: 2, Testing: 3.2.7 Description:
MrBayes is a program for the Bayesian estimation of phylogeny.
Bayesian inference of phylogeny is based upon a quantity called the
posterior probability distribution of trees, which is the probability of a
tree conditioned on the observations. The conditioning is accomplished using
Bayes's theorem. The posterior probability distribution of trees is
impossible to calculate analytically; instead, MrBayes uses a simulation
technique called Markov chain Monte Carlo (or MCMC) to approximate the
posterior probabilities of trees.
Homepage:https://nbisweden.github.io/MrBayes/ License: GPL-2
phyml
- Ebuilds: 1, Stable: 2.4.5-r4, Testing: 2.4.5-r4 Description:
Phyml is a simple, fast, and accurate algorithm to estimate large
phylogenies by maximum likelihood. Given input sequence files, it
estimates phylogenies using maximum likelihood, and is capable of
processing large amounts of phylogenetic data.
Homepage:http://atgc.lirmm.fr/phyml/ License: GPL-2
piler
- Ebuilds: 1, Stable: 1.0-r2, Testing: 1.0-r2 Description: Analysis of repetitive DNA found in genome sequences
Homepage:http://www.drive5.com/piler/ License: public-domain
primer3
- Ebuilds: 1, Testing: 2.3.7-r1 Description:
Primer3 picks primers for PCR reactions, considering: oligonucleotide
melting temperature, size, GC content, and primer-dimer possibilities;
PCR product size; positional constraints within the source sequence;
and miscellaneous other constraints. All of these criteria are
user-specifiable as constraints, and some are specifiable as terms in
an objective function that characterizes an optimal primer pair.
Homepage:http://primer3.sourceforge.net/ License: GPL-2
prints
- Ebuilds: 1, Testing: 39.0-r2 Description:
A protein motif fingerprint database maintained at the University of
Manchester. A fingerprint is a group of conserved motifs used to
characterise a protein family; its diagnostic power is refined by
iterative scanning of a SWISS-PROT/TrEMBL composite. Usually the motifs
do not overlap, but are separated along a sequence, though they may be
contiguous in 3D-space. Fingerprints can encode protein folds and
functionalities more flexibly and powerfully than can single motifs,
full diagnostic potency deriving from the mutual context provided by
motif neighbours.
Homepage:http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/ License: public-domain
prosite
- Ebuilds: 1, Testing: 2017.02-r1 Description:
A protein families and domains database maintained at the Swiss
Institude for Bioinformatics. It consists of biologically significant
sites, patterns and profiles that help to reliably identify to which
known protein family (if any) a new sequence belongs. PROSITE currently
contains patterns and profiles specific for more than a thousand
protein families or domains. Each of these signatures comes with
documentation providing background information on the structure and
function of these proteins.
Homepage:https://prosite.expasy.org/ License: swiss-prot
raxml
- Ebuilds: 1, Testing: 8.2.13 Description: Sequential, Parallel & Distributed Inference of Large Phylogenetic Trees
Homepage:https://github.com/stamatak/standard-RAxML License: GPL-2
rebase
- Ebuilds: 1, Testing: 1901-r2 Description:
The Restriction Enzyme data BASE is a collection of information about
restriction enzymes and related proteins. It is maintained by New
England Biolabs. It contains published and unpublished references,
recognition and cleavage sites, isoschizomers, commercial availability,
methylation sensitivity, crystal and sequence data. DNA
methyltransferases, homing endonucleases, nicking enzymes, specificity
subunits and control proteins are also included. More recently,
putative DNA methyltransferases and restriction enzymes, as predicted
from analysis of genomic sequences, are also listed.
Homepage:http://rebase.neb.com License: public-domain
samtools
- Ebuilds: 5, Stable: 1.20, Testing: 1.23 Description: Utilities for analysing and manipulating the SAM/BAM alignment formats
Homepage:http://www.htslib.org/ License: MIT
seaview
- Ebuilds: 1, Testing: 4.6-r2 Description:
SeaView is a graphical multiple sequence alignment editor developped by
Manolo Gouy. SeaView is able to read and write various alignment
formats (NEXUS, MSF, CLUSTAL, FASTA, PHYLIP, MASE). It allows to
manually edit the alignment, and also to run DOT-PLOT or CLUSTALW
programs to locally improve the alignment.
Homepage:http://pbil.univ-lyon1.fr/software/seaview.html License: public-domain
sibsim4
- Ebuilds: 1, Stable: 0.20, Testing: 0.20 Description: A rewrite and improvement upon sim4, a DNA-mRNA aligner
Homepage:http://sibsim4.sourceforge.net/ License: GPL-2
sim4
- Ebuilds: 1, Testing: 20030921-r2 Description:
sim4 is a similarity-based tool for aligning an expressed DNA sequence
(EST, cDNA, mRNA) with a genomic sequence for the gene. It also detects
end matches when the two input sequences overlap at one end (i.e., the
start of one sequence overlaps the end of the other).sim4 employs a
blast-based technique to first determine the basic matching blocks
representing the "exon cores". In this first stage, it detects all
possible exact matches of W-mers (i.e., DNA words of size W) between
the two sequences and extends them to maximal scoring gap-free
segments. In the second stage, the exon cores are extended into the
adjacent as-yet-unmatched fragments using greedy alignment algorithms,
and heuristics are used to favor configurations that conform to the
splice-site recognition signals (GT-AG, CT-AC). If necessary, the
process is repeated with less stringent parameters on the unmatched
fragments.
Homepage:http://globin.cse.psu.edu/html/docs/sim4.html License: GPL-2
stride
- Ebuilds: 2, Stable: 20060723, Testing: 20060723 Description: Protein secondary structure assignment from atomic coordinates
Homepage:http://webclu.bio.wzw.tum.de/stride/ License: STRIDE
t-coffee
- Ebuilds: 1, Testing: 11.00-r3 Description:
T-Coffee is a multiple sequence alignment package. Given a set of
sequences (Proteins or DNA), T-Coffee generates a multiple sequence
alignment. Version 2.00 and higher can mix sequences and structures.
T-Coffee allows the combination of a collection of multiple/pairwise,
global or local alignments into a single model. It also allows to
estimate the level of consistency of each position within the new
alignment with the rest of the alignments.
Homepage:http://www.tcoffee.org/Projects_home_page/t_coffee_home_page.html License: GPL-2
tree-puzzle
- Ebuilds: 1, Testing: 5.2-r1 Description:
TREE-PUZZLE is a computer program to reconstruct phylogenetic trees
from molecular sequence data by maximum likelihood. It implements a
fast tree search algorithm, quartet puzzling, that allows analysis of
large data sets and automatically assigns estimations of support to
each internal branch. TREE-PUZZLE also computes pairwise maximum
likelihood distances as well as branch lengths for user specified
trees. Branch lengths can be calculated under the clock-assumption. In
addition, TREE-PUZZLE offers a novel method, likelihood mapping, to
investigate the support of a hypothesized internal branch without
computing an overall tree and to visualize the phylogenetic content of
a sequence alignment. TREE-PUZZLE also conducts a number of statistical
tests on the data set (chi-square test for homogeneity of base
composition, likelihood ratio clock test, Kishino-Hasegawa test). The
models of substitution provided by TREE-PUZZLE are TN, HKY, F84, SH for
nucleotides, Dayhoff, JTT, mtREV24, VT, WAG, BLOSUM 62 for amino acids,
and F81 for two-state data. Rate heterogeneity is modeled by a discrete
Gamma distribution and by allowing invariable sites. The corresponding
parameters can be inferred from the data set.
Homepage:http://www.tree-puzzle.de License: GPL-2
treeviewx
- Ebuilds: 1, Stable: 0.5.1.20100823_p4-r1, Testing: 0.5.1.20100823_p4-r1 Description:
TreeView X is a program for displaying phylogenetic trees on Linux and
UNIX platforms. It can read and display NEXUS and Newick format tree
files (such as those output by PAUP*, ClustalX, TREE-PUZZLE, and other
programs).
Homepage:https://github.com/rdmpage/treeviewx License: GPL-2
trnascan-se
- Ebuilds: 1, Testing: 1.31-r3 Description:
tRNAscan-SE detects ~99% of eukaryotic nuclear or prokaryotic tRNA
genes, with a false positive rate of less than one per 15 gigabases,
and with a search speed of about 30 kb/second. It was implemented for
large-scale human genome sequence analysis, but is applicable to
other DNAs as well.
Homepage:http://lowelab.ucsc.edu/tRNAscan-SE/ License: GPL-2
uchime
- Ebuilds: 1, Stable: 4.2.40-r1, Testing: 4.2.40-r1 Description:
UCHIME is a new algorithm for detecting chimeric sequences. It was developed in
collaboration with Brian Haas, Jose Carlos Clemente, Chris Quince and Rob
Knight. Chimeras are commonly created during DNA sample amplification by
PCR, especially in community sequencing experiments using single regions
such as the 16S rRNA gene in bacteria or the fungal ITS region. UCHIME can
detect chimeras using a reference database or de novo using abundance
information on the assumption that chimeras are less abundant than their
parents because they must have undergone fewer rounds of amplification.
Homepage:https://www.drive5.com/usearch/manual/uchime_algo.html License: public-domain
ucsc-genome-browser
- Ebuilds: 1, Testing: 260-r2 Description: The UCSC genome browser suite, also known as Jim Kent's library and GoldenPath
Homepage:http://genome.ucsc.edu/ License: blat