Task: Next Generation Sequencing
Metapackage: false
Description: Debian Med bioinformatics applications usable in Next Generation Sequencing
 It aims at gettting packages which specializes in alignment of
 sequences produced by next generation sequencing.

Comment: Do not build a metapackage because it is not clear in how far this set of
 packages is complete regarding NGS.

Depends:
	bedtools,
	bwa,
	bowtie,
	fastx-toolkit,
	filo,
	last-align,
	maq,
	picard-tools,
	r-bioc-edger,
	r-bioc-hilbertvis,
	samtools,
	sra-toolkit,
	ssake,
	tabix,
	vcftools,
	velvet

Depends: mothur

Depends: qiime

Depends: cufflinks
Homepage: http://cufflinks.cbcb.umd.edu/
Vcs-Git: http://git.debian.org/?p=debian-med/cufflinks.git
License: Boost
Pkg-Description: Transcript assembly, differential expression, and differential regulation for RNA-Seq
 Cufflinks assembles transcripts, estimates their abundances, and tests for
 differential expression and regulation in RNA-Seq samples. It accepts aligned
 RNA-Seq reads and assembles the alignments into a parsimonious set of
 transcripts. Cufflinks then estimates the relative abundances of these
 transcripts based on how many reads support each one. 

Depends: mira-assembler
Published-Title: Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs
Published-Authors: Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WE, Wetter T, Suhai S.
Published-In: Genome Res. Jun;14(6):1147-59.
Published-Year: 2004
Published-doi: 10.1101/gr.1917404
Published-URL: http://pubmed.org/15140833

Depends: ssaha2
Homepage: http://www.sanger.ac.uk/resources/software/ssaha2/
License: to be clarified
Pkg-Description: pairwise sequence alignment program
 SSAHA2 (Sequence Search and Alignment by Hashing Algorithm) is a
 pairwise sequence alignment program designed for the efficient
 mapping of sequencing reads onto genomic reference sequences.  SSAHA2
 reads of most sequencing platforms (ABI-Sanger, Roche 454,
 Illumina-Solexa) and a range of output formats (SAM, CIGAR, PSL etc.)
 are supported. A pile-up pipeline for analysis and genotype calling
 is available as a separate package.
Published-Title: SSAHA: a fast search method for large DNA databases.
Published-Authors: Ning Z, Cox AJ and Mullikin JC
Published-In: Genome research 2001;11;10;1725-9
Published-doi: 10.1101/gr.194201

Depends: mosaik-aligner
Homepage: http://code.google.com/p/mosaik-aligner/
License: GPL
Pkg-Description: refrence guided aligner for next-generation sequencing
 MosaikBuild converts various sequence formats into Mosaik’s native
 read format. MosaikAligner pairwise aligns each read to a specified
 series of reference sequences. MosaikSort resolves paired-end reads
 and sorts the alignments by the reference sequence
 coordinates. Finally, MosaikText converts alignments to different
 text-based formats.
 .
 At this time, the workflow consists of supplying sequences in FASTA,
 FASTQ, Illumina Bustard & Gerald, or SRF file formats and producing
 results in the BLAT axt, the BAM/SAM, the UCSC Genome Browser bed, or
 the Illumina ELAND formats.


Depends: forge
Homepage: http://combiol.org/forge/
License: Apache 2.0
Pkg-Description: genome assembler for mixed read types
 Forge Genome Assembler is a parallel, MPI based genome assembler for
 mixed read types.
 .
 Forge is a classic "Overlap layout consensus" genome assembler written
 by Darren Platt and Dirk Evers. Implemented in C++ and using the
 parallel MPI library, it runs on one or more machines in a network and
 can scale to very large numbers of reads provided there is enough
 collective memory on the machines used. It generates a full consensus
 alignment of all reads, can handle mixtures of sanger, 454 and illumina
 reads. There is some support for solid color space and it includes built
 in tools for vector trimming and contamination screening.
 .
 Forge and was originally developed at Exelixis and they have kindly
 agreed to place the software which underwent much subsequent development
 outside Exelixis, into the public domain. Forge works with most of the
 common MPI implementations.
Remark: Competitor to MIRA2 and wgs-assembler
 This package was requested by William Spooner <whs@eaglegenomics.com> as
 a competitor to MIRA2 and wgs-assembler.

Depends: uc-echo
Homepage: http://uc-echo.sourceforge.net/
License: BSD License
Pkg-Description: error correction algorithm designed for short-reads from next-generation sequencing
 ECHO is an error correction algorithm designed for short-reads from 
 next-generation sequencing platforms such as Illumina's Genome Analyzer II. 
 The algorithm uses a Bayesian framework to improve the quality of the reads 
 in a given data set by employing maximum a posteriori estimation.

Depends: annovar
Homepage: http://www.openbioinformatics.org/annovar/
License: Open Source for non-profit
Pkg-Description: annotate genetic variants detected from diverse genomes 
 ANNOVAR is an efficient software tool to utilize update-to-date information 
 to functionally annotate genetic variants detected from diverse genomes
 (including human genome hg18, hg19, as well as mouse, worm, fly, yeast and 
 many others). Given a list of variants with chromosome, start position, end 
 position, reference nucleotide and observed nucleotides, ANNOVAR can perform:
 .
  1. Gene-based annotation: identify whether SNPs or CNVs cause protein coding 
     changes and the amino acids that are affected. Users can flexibly use RefSeq 
     genes, UCSC genes, ENSEMBL genes, GENCODE genes, or many other gene definition
     systems.
  2. Region-based annotations: identify variants in specific genomic regions,
     for example, conserved regions among 44 species, predicted transcription
     factor binding sites, segmental duplication regions, GWAS hits, database
     of genomic variants, DNAse I hypersensitivity sites, ENCODE 
     H3K4Me1/H3K4Me3/H3K27Ac/CTCF sites, ChIP-Seq peaks, RNA-Seq peaks, or many
     other annotations on genomic intervals. 
  3. Filter-based annotation: identify variants that are reported in dbSNP, 
     or identify the subset of common SNPs (MAF>1%) in the 1000 Genome Project,
     or identify subset of non-synonymous SNPs with SIFT score>0.05, or many 
     other annotations on specific mutations.
  4. Other functionalities: Retrieve the nucleotide sequence in any 
     user-specific genomic positions in batch, identify a candidate gene list 
     for Mendelian diseases from exome data, identify a list of SNPs from 
     1000 Genomes that are in strong LD with a GWAS hit, and many other 
     creative utilities.
 .
 In a modern desktop computer (3GHz Intel Xeon CPU, 8Gb memory), for 
 4.7 million variants, ANNOVAR requires ~4 minutes to perform 
 gene-based functional annotation, or ~15 minutes to perform stepwise 
 "variants reduction" procedure, making it practical to handle hundreds 
 of human genomes in a day.
