roche454ace2caf



Table of contents:

  1. Introduction
  2. Screendumps
  3. Pipeline, Example 1
  4. Pipeline, Example 2
  5. Pipeline, Example 3
  6. News/History
  7. Sources and other links


1) Introduction
Here comes some tools to convert GS20 or FLX assemblies (454Contigs.ace) into STADEN format
so that these are correct viewable/editable/... whithin the staden package (gap4):
You have then
- all furhter programs are open source
- full graphical overview about assembly
- exact aligned trace and there positions, base values etc and
- respects quality clipping informations
- access to "hidden data"
- shows associated flowgramm traces SFF format (tested with staden-1-7-0)

Description, Goals - please take a look at this Poster.pdf.


2) Screendumps:
2.1) Assembly with enabling "show cutoffs" assembly_with_cutoffs.gif
2.2) Assembly with enabling "show differences by dots" gap4_by_dots.gif
2.3) Trace view trace_example.gif

.... TOP?

3) Pipeline example 1:
			
1) Create traces with runPhoenix and create assembly with runAssembly and/or runMapping. 
   This results into one or more EIKxxx.sff traces and one 454Contig.ace.
   At best you create a new directory with this files (copy or link this).

   454Contigs.ace
   EIK12345a.sff
   EIK12345b.sff

2) Create fasta and quality files from each traces:

   bash oneliner e.g:
   
    % for f in $(ls *.sff); do F=${f%%.sff}; sff_dump -f $F.fna -q $F.qual $f; done 

   So, you have now:
   
   454Contigs.ace
   EIK12345a.sff
   EIK12345b.sff
   EIK12345a.fna   <---
   EIK12345a.qual  <---
   EIK12345b.fna   <---
   EIK12345b.qual  <---
   ...
      
3) Check or modify line with regulare expression 'glob(E*.fna)' and 'glob(E*.qual) whithin roche454ace2caf.pl 
   and also check the correct path to your external executable 'align_to_scf' via RPATH shell variable.
   
4) Convert assembly from ACE into CAF format:
	
    % roche454ace2caf.pl -i 1 -c 5 >454.caf 2>454.err 
    
   ( -i 1 = enable partial trace names like EIXXXX_to1010 and EIXXXX_fm1212 )
   ( -c 5 = dont convert traces short 5 bases (only seen in older roche454 versions) )

   Also available:
   ( -h      HELP )
   ( -a      add/duplicate contig as additional trace because STADEN generates own consensus quality values)
   ( -f xx = read contig from different ace file)
   ( -q xx = read quality values from different sources )
   
   
5) Create GAP4 database from CAF file:

    % caf2gap -project 454 -version 0 < 454.caf >x.out 2>x.err 
    
    
6) Optionally  for speed up displaying traces in gap4 you should convert sff into hashed sff:
   
    % for f in $(ls *[A-Z0-9].sff); do F=${f%%.sff}; hash_sff -o ${F}_hash.sff $f ; done 
   
7) Create gap4sff like:


#!/bin/bash TRACE_PATH=$(ls sff/*_hash.sff | sed 's/^/:HASH=/' | tr -d '\012') export TRACE_PATH if [ -f /opt/staden/staden.profile ] then . /opt/staden/staden.profile exec /opt/staden/${MACHINE}-bin/gap4 ${@+"$@"} else echo "Can't find any suitable staden environment" fi
and finnaly run it: % ./gap4sff 454.0
.... TOP?

4) Pipeline example 2:

In case of using foreign or remote services for assemblies often only the quality file 454Contigs.qual is known;
Than you can substitute step 4 with:

 % roche454ace2caf.pl -q 454Contigs.qual -f 454Contigs.ace >454.caf 2>454.err 

.... TOP?

5) Pipeline example 3:

You can also run all-in-one-utility roche2gap script:

 % roche2gap -d gap4 -p HKI -v 1 

News / History

roche454ace2gap.pl - V1.10 (08.12.2010):

- Add option -a == enable/disable adding (or duplicating) contig as additional read.
Feature was statically introduced in version 1.09.

align_to_scf - V1.06 (04.09.2009):

- some bugs reported with large sequences or rare compiler bug with old binaries and/or optimizing problem; So recompiling solved the problem. very obscure or to much. No other changes.

align_to_scf - V1.05 (29.01.2009):

- bug fixed with large line length

align_to_scf - V1.04 (24.12.2008):

- bug fixed in case of large stretched sequences filled by too many dashes

sff_dump - V1.04 (11.11.2008):

- Bug fixed under rare compiler optimization options (gcc -O2 ...)
- Speed up; runs now 30% faster

roche454ace2gap.pl - V1.09 (21.10.2008):

- Fixed bug: calculated negative starting position on some few traces under special cases.
- AddOn: Original contig sequence is added to assembly as single read because STADEN
calculates his own and sometime different consensus
- Old 'illegal' traces are now named 'partial' traces in help text


roche454ace2gap.pl - V1.08a (20.05.2008):

Due to some positive feedback from James Bonfield (jb) from sanger the executable 'align_to_scf' (v1.02) runs so much faster (from hours down to seconds) that no more parallel variant and subroutines are needed and therefore removed from code;

roche454ace2gap.pl - V1.08 (14.04.2008):

Due to some changes from roche(R) offInstrumente program 'runAssembler'
following changes are incorporated:
There are many variations were quality clipping informations was hide:
- no clipping information (QL==1, QR == length of good clipped sequence)
- clipping information at trim line ( e.g. TRIM=5-123 )
- clipping information now at QA line (QL=2, QR=123)

The time consumpting align_to_scf' runs via scripts run_align2scf.sh and splits first 454contigs.qry into
several 454contigs.qrx and calls MAKE for running align_to_scf (make -j8 ) parallel;
At end all results are pasted back into 454contigs.aln.


.... TOP?


6) Please visit Download & Sources:
  • Mostly your will using this conversation tools on Linux (at least x86 architecture), so I have provided now a tar archive.

    You should download latest version roche454ace2gap-2010-12-08.tgz !
    Decompress and unpack it with GNU tar:

    % mkdir -p /usr/local/roche2gap
    % cd /usr/local/
    % gtar -xvf roche454ace2gap-2010-12-08.tgz

    It contains binaries, scripts and sources and should extracted into /usr/local/roche2gap/.
    If you set RPATH you can use any other local directory for the following programs:
    - bin/sff_dump
    - bin/align_to_scf
    - bin/roche454ace2caf.pl
    - bin/caf2gap
    - bin/ace_contig_coverage.pl
    - bin/ace2caf_minimal.pl
    - bin/ace_split.pl
    - bin/acestatus.pl


  • What else did you get ?
  • Other perl utilities:
    • ace2caf_minimal.pl - Converts ace into caf without any trace/quality information. So very fast.
    • ace_split.pl - Splits one big ace files into different contig ace files with one contig only.
    • ace_contig_coverage.pl - Generates thumbnail grafik (png) about coverage of each contigs; required external gnuplot
    • ace_status.pl - some statistics (padded,unpadded) about ace file
  • External links for convenience:
.... TOP?

    Last Modified: