SEED
From Genomic Standards Consortium
On this page: |
[edit] Introduction
[edit] Genomic Rosetta Stone (GRS) and GRS Resolver
Participating in GRS AND GRS Resolver
[edit] Identifier Policy
[edit] Identifier Naming Convention
SEED genomic identifiers are a combination of taxid and version
[edit] Identifier Mapping Availability
The following file was sent in April 2006 and contains a mapping of SEED identifers to INSDC Project identifiers:
Previous communication suggests that SEED identifiers are available mapped against GOLD identifiers.
[edit] Mapping of SEED to NMPDR, Uniprot, GI numbers
- file description - From ftp://ftp.theseed.org/misc/Data/idmapping/README_Corr.txt
- file download - ftp://ftp.theseed.org/misc/Data/idmapping/linking_table_nmpdr
Following text is adapted from [1]:
Filename: linking_table_nmpdr
This file contains a table of corresponding IDs from NMPDR, UniProt and GI Numbers. This is a tab delimited file with the following columns:
- SEED/FIG ID
- NMPDR ID http://www.nmpdr.org/
- UniProt AC
- List of GI numbers, separated by semicolons
- Functional assignment (i.e., the function assigned by NMPDR)
Computing the correlation between NMPDR and UniProt:
First an attempt to identify coresponding organisms is made. This is done by forming sets of proteins that share identical sequence. We then consider only sets that contain a single UniProt ID and a single FIG id. For each such set, we form the tuple
[UniProt genome, FIG genome].
We accumulate counts of such pairs. We tabulate these pairs and use them to form an estimate of the most likely correspondence between genomes (i.e., the correspondence between how UniProt identifies its genomes and how the FIG identifies theirs).
Then, the correspondence between IDs is formed by reconsidering the sets that share identical sequence. When such a set contains a single UniProt ID from a given UniProt genome, and it contains a single FIG ID from the corresponding FIG genomes, then the UniProt ID and the FIG ID are taken to represent the same protein.
Once this correspondence is formed, we can use (for any pair of corresponding FIG and UniProt IDs) the BioThesauris data gathered by PIR to establish the NCBI taxonomy and the GI numbers that should be associated with the FIG ID.
The PIR idmapping service can be acquired from at (http://pir.georgetown.edu/pirwww/search/idmapping.shtml). All GI numbers are based on the mapping between UniProt AC and GI number. The mapping file can be found at ftp://ftp.pir.georgetown.edu/databases/idmapping,