towards a richer set of information to describe our complete genome collection

GRS Resolver

From Genomic Standards Consortium

On this page:

[edit] Introduction

To make best use of the Genomic Rosetta Stone, we aim to engineer a web-based resolution service to produce lists of links to all databases in which an instance of a particular genome occurs. This tool will function much like a ‘currency converter’; entering an identifier will return synonymous identifiers for a particular (set of) genome(s) or metagenome(s). We will make the mapping and code freely available.

[edit] Current Implementation

The Genome Catalogue now contains a pilot Resolver interface: http://gensc.org/gsc/gcat/xtr/rosetta-stone

This page describes how to access the Web services provided by the Resolver: Web_service_-_genomic_rosetta_stone

The current implementation does not use NCBI LinkOut, and relies on a local genomic identifier mapping. It is proposed that the current implementation including a web service will be deprecated once the GRS Resolver incorporating LinkOut is available.

[edit] GRS Resolver using NCBI LinkOut

At the 5th GSC workshop in December 2007, it was proposed that the GRS Resolver make use of the NCBI's LinkOut facility. LinkOut would be utlised to host and maintain a genomic identifier mapping. The GRS Resolver would query LinkOut to extract genomic identifiers, and would allow queries against the identifier mapping, not currently available via LinkOut e.g. query using non-NCBI identifier.

[edit] Why is a GRS Resolver needed if NCBI LinkOut hosts the genomic ID mapping?

The principal reason is that the NCBI eLink web service that is used with LinkOut can only be queried using an NCBI database identifier e.g. genome, genomeprj.

At the time of writing, February 2008, it is not possible to query LinkOut using information, including identifiers, contained in the external resources associated with an NCBI database identifier.

[edit] Summary - Functional Requirements, Procedure, Data Management

In summary, the GRS Resolver would require:

  • genomic database owners to register with NCBI LinkOut, and associate their datasets with the NCBI genome project (genomeprj) database identifier
  • the NCBI database identifier mapping for all NCBI genome project (genomeprj) identifiers to be downloaded using NCBI LinkOut eLink web service
  • the external resources mapping for all NCBI genome project (genomeprj) identifiers to be downloaded using NCBI LinkOut eLink web service
  • the downloaded identifier mappings to be hosted in a dedicated GRS database
  • local maintenance of the identifier mapping via regular downloads and housekeeping
  • web query interface to identifier mapping
  • web service query interface to identifier mapping
  • separate web service for Taverna, if appropriate


[edit] Functional Requirements

[edit] GRS Resolver Web Service

[edit] GRS Resolver Web Query Interface

  • Search
    • with identifier only
    • with identifier and specify identifier type, e.g. GCAT, GOLD, INSDC
      • a list of identifier types available in the GRS is provided
  • Query Result
    • Show identifier mapping in the browser
      • Link to resolve identifiers
      • Link to resource homepage
    • Link to XML file containing the identifier mapping

[edit] Procedure

[edit] Genomic Data Providers to Register with NCBI LinkOut

[edit] GRS LinkOut Tutorial

GRS LinkOut tutorial for genomic data providers, describing the process to register with NCBI LinkOut as an external resource provider.

[edit] NCBI Guidelines

Guidelines on how to register with NCBI LinkOut as an external resource provider source :

  • All linking information is submitted by LinkOut providers - the owner or agent for the owner of the online resource.
  • LinkOut providers are responsible for maintaining their links.
  • To submit links to your resource, you will need to upload two XML files, an identity file and a resource file.
    • The identity file contains the information about your organization needed to list your resource(s) in LinkOut.
    • The resource file describes the Entrez records you will link from and contains the information that LinkOut needs to generate the links.

NCBI Guidelines - Quick Links

[edit] Genomic Databases as Candidates for NCBI LinkOut

The following databases have been identified as candidates for the GRS, and therefore will need to register with NCBI LinkOut as an external resource provider:

[edit] NCBI LinkOut : NCBI database identifier mapping download

  • Download GOLD dataset (all data) GOLD dataset (all data) http://genomesonline.org
  • For each NCBI genome project identifier defined in the GOLD dataset
    • query NCBI LinkOut via eLink web service and save XML documents returned
      • example URL
      • limit request to identifiers from the NCBI databases: books, structure, genome, taxonomy, geo, pubmed

[edit] NCBI LinkOut : external resources mapping download

  • Download GOLD dataset (all data) GOLD dataset (all data) http://genomesonline.org
  • For each NCBI genome project identifier defined in the GOLD dataset
    • query NCBI LinkOut for external resources via eLink web service
    • With the GOLD and other genomic datasets, create XML documents that corresponds to NCBI LinkOut eLink DTD that include an identifier mapping.
      • Reasons:
        • currently there are genomic identifiers in GOLD and elsewhere that are not available via LinkOut e.g. GCAT, RDP, Straininfo.net
        • there are identifier mappings in GOLD that do not have an NCBI genome project identifier
    • With the XML documents, transform to include relevant XML fragment from query to NCBI LinkOut.
    • Save XML documents locally

[edit] Data Management

[edit] Maintenance of the NCBI identifier mapping

It is assumed that the responsiblity to maintain the identifier mapping provided via LinkOut, is with NCBI and the database providers who have registered with the NCBI LinkOut service.

[edit] NCBI Policy - File Maintenance - Provider Responsibilities

Source: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helplinkout.section.nonbib.File_Maintenance

Link providers are responsible for:

   * maintaining their LinkOut files
   * transferring any additions, changes or deletions of their links to NCBI
   * updating files and informing NCBI when access rights are changed
   * correcting broken or incorrect links in a timely manner

Providers may transfer new versions of current files or add new resource files at any time. It is the responsibility of the provider to keep files current and valid. Links are regenerated every day based on the resource files in each provider’s directory. Therefore, providers must delete obsolete files from their holdings directory.

[edit] Out of Scope for GRS Resolver?

Maintenance of the identifier mapping is considered outside the scope of the GRS Resolver, although this point is open for discussion, e.g.

  • does the GSC have a role to play in identifying new genomic databases that should register with the LinkOut service?
  • does the GSC have a part to play in chasing up database providers to update their data held in LinkOut, if it is found to be out of date?


[edit] Communication with LinkOut Team

The following page includes email discussions with the LinkOut team with respect to the identifier mapping:

Genomic Rosetta Stone and LinkOut

[edit] Hosting of the downloaded Genomic Identifier Mapping

[edit] Maintenance of the Local Genomic Identifier Mapping

[edit] Implementation

[edit] GRS Resolver Web Service

[edit] Proposal

20 February 2008

Proposal to return XML documents that correspond in part to NCBI LinkOut eLink DTD. To illustrate, the prototype GRS Resolver web service incorporating NCBI LinkOut, returns an XML document that is very similar to that returned by NCBI LinkOut:

http://gensc.org/gsc/gcat/xtr/services/v1/grs?query=16718

NCBI eLink DTD

Proposal 18 February 2008

Two XML schema files (with example documents) that define the XML documents returned by the GRS resolver are available for review/feedback (curator@ceh.ac.uk):


[edit] Prototype

The GRS Resolver prototype web service is available at:

http://gensc.org/gsc/gcat/xtr/services/v1/grs

Example: http://gensc.org/gsc/gcat/xtr/services/v1/grs?query=16718

NCBI eLink DTD

[edit] GRS Resolver Web Query Interface

[edit] Prototype

A GRS Resolver prototype incorporating NCBI LinkOut, is actively under development.

The GRS Resolver prototype is available at:

http://gensc.org/gsc/gcat/xtr/grs

[edit] Mailing List

A dedicated mailing list exists for the discussion of issues concerning the Genomic Rosetta Stone and the GRS Resolver.

[edit] Subscribe

To subscribe visit the GSC SourceForge project site: http://sourceforge.net/mail/?group_id=153365

[edit] Mailing list archive

[gensc-identifier mailing list]

[edit] People

Members of the GSC who are involved in discussions concerning the GRS and GRS Resolver:

(subscribed to gensc-identifier mailing list on 21st February 2008)

[edit] Further Information

[edit] Genomic Rosetta Stone


[edit] NCBI LinkOut

Text from [eLink Help]:

LinkOut is a service of Entrez that allows you to link directly from PubMed and other Entrez databases to a wide range of information and services beyond the Entrez system. LinkOut aims to facilitate access to relevant online resources in order to extend, clarify, and supplement information found in the Entrez databases. Examples of LinkOut Resources include full-text publications, biological databases, consumer health information, research tools, and more.

All links are specially assigned to specific database records. When accessing a link through LinkOut, no additional searching should be necessary to access the relevant resource that has been linked to the record. Please encourage online resources that may be valuable to Entrez users to participate in LinkOut.

[edit] NCBI databases that can be used to query LinkOut

[http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?]:

[edit] LinkOut Query Example 1 : NCBI Database Identifier Mapping

The following query returns a document containing a mapping of identifiers from NCBI databases *only* for a given NCBI genome project (genomeprj) identifier:

[http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?db=all&id=1&dbfrom=genomeprj]


document returned


<?xml version="1.0"?>
<!DOCTYPE eLinkResult PUBLIC "-//NLM//DTD eLinkResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_020511.dtd">
<eLinkResult>
<LinkSet>
        <DbFrom>genomeprj</DbFrom>
        <IdList>
                <Id>1</Id>
        </IdList>
        <LinkSetDb>
                <DbTo>genomeprj</DbTo>

                <LinkName>genomeprj_genomeprj</LinkName>
                <Info>Empty result</Info>
        </LinkSetDb>
        <LinkSetDb>
                <DbTo>nucest</DbTo>
                <LinkName>genomeprj_nucest</LinkName>
                <Info>Empty result</Info>

       .......
        <LinkSetDb>
                <DbTo>nucgss</DbTo>
                <LinkName>genomeprj_nucgss_wgs</LinkName>
                <Info>Empty result</Info>

        </LinkSetDb>
        <LinkSetDb>
                <DbTo>popset</DbTo>
                <LinkName>genomeprj_popset</LinkName>
                <Info>Empty result</Info>
        </LinkSetDb>
        <LinkSetDb>

                <DbTo>taxonomy</DbTo>
                <LinkName>genomeprj_taxonomy</LinkName>
                <Link>
                        <Id>240015</Id>
                </Link>
        </LinkSetDb>
</LinkSet>
</eLinkResult>



[edit] LinkOut Query Example 2 : External Resources Identifier Mapping

The following query returns mapping of external resources to a given NCBI genome project identifier (genomeprj):

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?db=all&id=16718&dbfrom=genomeprj&cmd=llinkslib


[edit] External Resources registered in LinkOut and associated with genomeprj

[edit] eLink XML Document Type Definition

eLink DTD

[edit] Resource File

A resource file is needed to associate links/identifiers with an identifier in an entrez database.

[Resource File help page]

The resource file describes the Entrez records the provider will link from and contains the information that LinkOut needs to generate the links. Links described in the resource file should link directly to the resource; users should not have to perform any additional searching to access the resource after clicking the provider’s link.

From the list of elements that can be included in a resource file, the database element is required:

Database (required): A sub-element of ObjectSelector or SubObjectSelector that specifies the Entrez database in which the links will appear.

[edit] LinkOut External Resource File : Attribute element

NCBI Documentation regarding Attribute:

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helplinkout.section.files.Special_Elements_Att

From [1]:

Attribute (repeatable): Attributes describe resources independent of content and describe any ownership of the information that is being claimed by the individual or organization providing the link. Attributes apply to all resources identified within a <Link>. See Special Elements: Attribute for the list of Attributes and descriptions.

[edit] Relevant Bio- Identifier Mapping Projects

Loading...