Drosophila pseudoobscura genome annotation from FlyBase Release 1.04. See http://flybase.net/annot/dpse_release1.04.txt Date 20041216 DATA CONTENTS Feature count ------------------------------------------------------------ CDS 9775 chromosome_arm 2 gene 11974 gene:genewise 17810 gene:genscan 16778 gene:twinscan 17943 golden_path 15 golden_path_fragment 2547 intron 7953 mRNA 9775 mRNA:genewise 17810 mRNA:genscan 16778 mRNA:twinscan 17943 match:blastn:na_dbEST.dpse 27519 match:blastz 16480 orthologous_region 11880 scaffold 2802 source 17 syntenic_region 1116 ------------------------------------------------------------ Note on features: * 60 syntenic_region features are withheld from this release 1.03 until missing data is located. Largest assembly units are chromosome_arm = 2, 3 golden_path = 4_group1, 4_group2, 4_group3, 4_group4, 4_group5, XL_group1a, XL_group1e, XL_group3b, XL_group3a, XR_group3a, XR_group5, XR_group6, XR_group8, XR_group9 U The 'U' chromosome is the unordered collection of contigs which are not assigned to other units. These are artifically ordered by ID in one collection for presentation purposes. Do not be misled by sequence positions which are not valid outside the contig. Computed genes are assigned names with 'GA' prefix for this species, with an FBgn ID gene id. See FlyBase document NOMENCLATURE FOR ANNOTATION-BASED GENE SYMBOLS IN DROSOPHILIDAE Computational features are :genewise, :genscan, :twinscan and match: Data are from Postgres Chado database, release 1.04, 20041216, Copy at ftp://flybase.net/genomes/Drosophila_pseudoobscura/dpse_r10_20041216/pgsql/ BULK FILE SET See ftp://flybase.net/genomes/Drosophila_pseudoobscura/current/ blast/ - NCBI blast database set for selected fasta/ feature sets. dna/ - contains dna raw format files per chromosome-arm fasta/ - dna and protein data per chromosome and feature type; gff/ - GFF v3 standard feature files per chromosome gnomap/ - Gnomap standard feature files per chromosome (drive genome map views) These last two contain chromosome locations of above listed features ------------------------------------------------- COMPUTATIONAL ANALYSIS OVERVIEW Date: Wed, 27 Oct 2004 14:22:38 -0400 (EDT) From: Peili Zhang Here's a brief description of what FlyBase has done in the comparative analyses of the pseudoobscura (abbreviated as dpse below) genome against melanogaster (abbreviated as dmel below) and in generating the first computation-derived version of the dpse genome annotation. We started off by mapping the locations of the putative orthologs on dpse genome relative to dmel. To achieve this goal, we first selected one protein isoform per gene from the dmel annotation, then ran TBLASTN against the dpse WGS contigs using the selected dmel protein set as query. From this exercise, we derived the locations of putative orthologs on dpse genome for more than 12,000 dmel genes. The putative ortholog locations were further confirmed or modified when we took into account the synteny information of the genes on dmel genome. Finally we generated the syntenic blocks between dmel and dpse and further extended the blocks using the blastz HSPs between the two genomes. To generate the first version of the dpse genome annotation computationally, we created a gene feature at each of the putative ortholog positions. In addition, three gene predictors, Twinscan, Genscan & Genewise, were run independently on the dpse genome. After semi-automatic filtering of the predictions to retain only one gene prediction for each locus, most of the predicted proteins are the reciprocal best hits to their dmel counterparts. Next, we checked for the overlap between the dpse gene features created for each of the putative orthologs derived from TBLASTN and the gene predictions, and attached a predicted gene model to each of ~90% of the gene loci on dpse genome. This completes the generation of the dpse genome annotation release 1.0, which is now publicly available through Genbank. The links to the CON records can be found at the bottom of the WGS project master record AADE00000000. Please note that only the gene models annotated on dpse genome have been submitted to Genbank. The orthologous regions and sytenic blocks data derived primarily from TBLASTN, the blastz HSPs, the alignments of dpse ESTs onto the dpse genome and the unfiltered gene predictions from all three predictors etc. will be publicly available on official FlyBase web site (http://www.flybase.org)