Strategy, Material, and Methods


General Strategy

The Dictyostelium genome is being sequenced with the chromosomal shotgun methodology on a chromosome by chromosome basis. The six Dictyostelium chromosomes with sizes ranging from four to seven Mb are being separated by PFGE (Pulsed Field Gel Electrophoresis) by Edward Cox, Princeton University. Chromosomal libraries with insert sizes between 1 and 4 kb are being generated by the Sanger Centre using pUC18 as a vector. For shotgun sequencing both forward and reverse reads from each clone randomly selected from the chromosome-enriched libraries are being produced. The read pairs will represent a useful mapping resource to assess contig order. For every Mb of DNA an estimated number of 20,000 - 25,000 single shotgun reads will be necessary.


Clone Resources for Sequencing

As a service to the scientific community D. discoideum raw sequence data are released on a daily basis.
The raw sequence data are generated from four libraries:

The clone length distributions throughout the used DNA libraries have been analysed statistically. Moreover, the content of extrachomosomal DNA has been determined.


Other Clone Resources

A YAC library, generated by Adam Kuspa, BCM and William Loomis, UCSD, La Jolla, is already available. The YAC clones have been ordered along the chromosomes (Kuspa, A. and Loomis, W. F., 1996, Proc. Natl. Acad. Sci. USA 93, 5562-5566) and YAC 'skims' of an overlapping minimal tiling set can be used to guide the assembly process.

A BAC or circular YAC library is being created by Pieter deJong. A low copy plasmid library with insert sizes between 3 and 5 kb is being prepared by Adam Kuspa. Both genomic libraries will be used later in the project for gap closure and as a resource for the distribution of clones.


Metric and Physical Map

HAPPY mapping is being employed by Paul Dear, MRC, Cambridge. A high density (8kb resolution) map of chromosome 6 is being created using STS markers; a less dense (50-70kb) map will be made of the rest of the genome. Thus an independent metric map will be established that will supplement the existing physical map with approximately 100kb resolution. Gene Prediction


Gene Prediction

Trials were made to use GlimmerM (TIGR) to predict genes in Dictyostelium discoideum. GlimmerM is an eukaryote-suited derivative of the prokaryote gene predictor Glimmer (TIGR), and was optimized for gene prediction in the malaria genome. Since Plasmodium falciparum has a similar AT-rich genome like Dictyostelium discoideum, and also intron frequency is similar in both organisms, use of GlimmerM for gene prediction in Dictyostelium discoideum seemed resonable. However, it turned out that the adaptation of GlimmerM to any new organism was limited, since public-side training of GlimmerM is restricted to codon usage adjustments.

The gene finder program geneid (IMIM, Barcelona) turned out to be a flexible open framework to implement gene prediction in Dictyostelium discoideum. Training was based on a set of 130 informative, non-redundant single-gene contigs deposited in the GenBank database. Subjects of training have been: Markov model for scoring of coding probability, matrices for splice sites, translation start and stop, and overall scoring balance. In a later step, using a set of multi-gene contigs emphasis was put on training for correct discrimination between introns and intergenic regions - a criterion that has been ignored in many benchmark studies.


fig. Comparison of some gene predictor results on the genomic sequence of the trfA gene (GenBank Acc.No. AF009080).
annotated  original annotation from the GenBank entry. Box colors reflect the frame phase of the CDS at the border of the exon.
GenScan_At  prediction with GenScan using the Arabidopsis thaliana configuration. The height of the boxes correspond to the score of the predicted exon. Box colors reflect the frame phase of the CDS at the border of the exon.
GlimmerM  prediction with GlimmerM after optimization for Dictyostelium discoideum codon usage
geneid  prediction with geneid after training for Dictyostelium discoideum. The height of the boxes correspond to the score of the predicted exon. Box colors reflect the frame phase of the CDS at the border of the exon.

We offer download of the Dictyostelium discoideum parameter file for geneid. The program itself is available from the IMIM, Barcelona.


See our guidelines on use of data in publications



Institute of Biochemistry I, Cologne
Angelika A. Noegel; Ludwig Eichinger
Dept. Genome Analysis, IMB Jena
Gernot Glöckner; Matthias Platzer