The mapping procedure

We started the assembly with reads matching genes that had been mapped to chromosome 2 (seeds). Subsequently, we included reads with a higher than average frequency of occurrence and extended contigs via repeated assembly cycles. A large number of contigs generated in the assembly process were linked through the incorporation of read pair information. Thus, clones spanning sequencing gaps between neighbouring contigs defined scaffolds. Sequence and cYAC mapping information as well as HAPPY map linkage data were used to reconstruct the chromosomal map. Thus, the position of most scaffolds, with a total length of 6.5 Mb, could be determined. Only four scaffolds totalling 0.6 Mb of sequence could not be placed onto the HAPPY map. Of these, two scaffolds are presumably located at the ends of the mapped portion. In addition, the assembly contains 71 unlinked orphan contigs amounting to 0.41 Mb, which consist mainly of fragments of complex repetitive elements.

Gap closure and associated problems

Three kinds of gaps can be defined:

sequence gaps gaps spanned by one or more sequencing clones
clone gaps gaps at the end of contigs, neighbouring contigs eventually linked by Happy Map Markers
repeat gaps gaps caused by unresolvable complex repetitive element loci

Closure of 809 sequence gaps showed that none was longer than 350 bp with a mean value of around 50 bp per gap. If we assume, that the length of the remaining sequence gaps is in the same range, we estimate that the sum of these 95 gaps will be approximately 10 kb. Yet, closure of the remaining sequence gaps failed due to the lack of specific sites for oligonucleotides, which could be used for primer walk procedures. To outline why sequence gaps are recalcitrant to repeated attempts to sequencing we compared the length and distribution of homopolymer runs of the D. discoideum chromosome 2 with those of the fully sequenced two chromosomes of Plasmodium falciparum, an organism with a comparable high A+T content. We found that the sequenced portions of D. discoideum genome contain more and longer homopolymer runs than those of P.falciparum. The results can be viewed as graphical representation.

Of the 107 clone gaps observed in the 6.5 Mb linked portion of the assembly, 18 result from non-resolvable, inserted complex repetitive elements whose sizes cannot be accurately determined due to truncations and the complex nature of many repeat loci. 20 inverse PCR approaches were performed which led to the closure of 2 clone gaps. Three additional clone gaps were closed by careful assembly of repetitive regions. Search for additional sequence information in the low quality data led to the transformation of 5 clone gaps to sequence gaps. Attempts to span clone gaps by PCR products gave no further results.

Importantly, none of the gaps closed so far contained sequences with coding potential. We conclude therefore that, despite the gaps, we have most likely defined all of the coding potential on chromosome 2.



Dept. Genome Analysis, IMB Jena
Gernot Glöckner; Matthias Platzer
Institute of Biochemistry I, Cologne
Angelika A. Noegel; Ludwig Eichinger