Bio::DB::SoapEUtilities - Interface to the NCBI Entrez web service *BETA*
use Bio::DB::SoapEUtilities;
# factory construction
my $fac = Bio::DB::SoapEUtilities->new()
# executing a utility call
#get an iteratable adaptor
my $links = $fac->elink(
-dbfrom => 'protein',
-db => 'taxonomy',
-id => \@protein_ids )->run(-auto_adapt => 1);
# get a Bio::DB::SoapEUtilities::Result object
my $result = $fac->esearch(
-db => 'gene',
-term => 'sonic and human')->run;
# get the raw XML message
my $xml = $fac->efetch(
-db => 'gene',
-id => \@gids )->run( -raw_xml => 1 );
# change parameters
my $new_result = $fac->efetch(
-db => 'gene',
-id => \@more_gids)->run;
# reset parameters
$fac->efetch->reset_parameters( -db => 'nucleotide',
-id => $nucid );
$result = $fac->efetch->run;
# parsing and iterating the results
$count = $result->count;
@ids = $result->ids;
while ( my $linkset = $links->next_link ) {
$submitted = $linkset->submitted_id;
}
($taxid) = $links->id_map($submitted_prot_id);
$species_io = $fac->efetch( -db => 'taxonomy',
-id => $taxid )->run( -auto_adapt => 1);
$species = $species_io->next_species;
$linnaeus = $species->binomial;
This module allows the user to query the NCBI Entrez database via its SOAP
(Simple Object Access Protocol) web service (described at
<
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/soap/v2.0/DOC/esoap_help.html>).
The basic tools ("einfo, esearch, elink, efetch, espell, epost") are
available as methods off a "SoapEUtilities" factory object.
Parameters for each tool can be queried, set and reset for each method through
the Bio::ParameterBaseI standard calls ("available_parameters(),
set_parameters(), get_parameters(), reset_parameters()"). Returned data
can be retrieved, accessed and parsed in several ways, according to user
preference. Adaptors and object iterators are available for
"efetch", "egquery", "elink", and
"esummary" results.
The "SoapEU" system has been designed to be as easy (few includes,
available parameter facilities, reasonable defaults, intuitive aliases,
built-in pipelines) or as complex (accessors for underlying low-level objects,
all parameters accessible, custom hooks for builder objects, facilities for
providing local copies of WSDLs) as the user requires or desires. (To the
extent that it does not succeed in either direction, it is up to the user to
report to the mailing list ("FEEDBACK")!)
To begin, make a factory:
my $fac = Bio::DB::SoapEUtilities->new();
From the factory, utilities are called, parameters are set, and results or
adaptors are retrieved.
If you have your own copy of the wsdl, use
my $fac = Bio::Db::SoapEUtilities->new( -wsdl_file => $my_wsdl );
otherwise, the correct one will be obtained over the network (by Bio::DB::ESoap
and friends).
To run any of the standard NCBI EUtilities ("einfo, esearch, esummary,
elink, egquery, epost, espell"), call the desired utility from the
factory. To use a utility, you must set its parameters and run it to get a
result. TMTOWTDI:
# verbose
my $fetch = $fac->efetch();
$fetch->set_parameters( -db => 'gene', -id => [828392, 790]);
my $result = $fetch->run;
# compact
my $result = $fac->efetch(-db =>'gene',-id => [828392,790])->run;
# change ids
$fac->efetch->set_parameters( -id => 470338 );
$result = $fac->run;
# another util
$result = $fac->esearch(-db => 'protein', -term => 'BRCA and human')->run;
# the utilities are kept separate
%search_params = $fac->esearch->get_parameters;
%fetch_params = $fac->efetch->get_parameters;
$search_param{db}; # is 'protein'
$fetch_params{db}; # is 'gene'
The factory is Bio::ParameterBaseI compliant: that means you can find out what
you can set with
@available_search = $fac->esearch->available_parameters;
@available_egquery = $fac->egquery->available_parameters;
For more information on parameters, see
<
http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html>.
The "intermediate" object for "SoapEU" query results is the
Bio::DB::SoapEUtilities::Result. This is a BioPerly parsing of the SOAP
message sent by NCBI when a query is "run()". This can be very
useful on it's own, but most users will likely want to proceed directly to
"Adaptors", which take a "Result" and turn it into more
intuitive/familiar BioPerl objects. Go there if the following details are too
gory.
Results can be highly- or lowly-parsed, depending on the parameters passed to
the factory "run()" method. To get the raw XML message with no
parsing, do
my $xml = $fac->$util->run(-raw_xml => 1); # $xml is a scalar string
To retrieve a Bio::DB::SoapEUtilities::Result object with limited parsing, but
with accessors to the SOAP::SOM message (provided by SOAP::Lite), do
my $result = $fac->$util->run(-no_parse => 1);
my $som = $result->som;
my $method_hash = $som->method; # etc...
To retrieve a "Result" object with message elements parsed into
accessors, including "count()" and "ids()", run without
arguments:
my $result = $fac->esearch->run()
my $count = $result->count;
my @Count = $result->Count; # counts for each member of
# the translation stack
my @ids = $result->IdList_Id; # from automatic message parsing
@ids = $result->ids; # a convenient alias
See Bio::DB::SoapEUtilities::Result for more, even gorier details.
Adaptors convert EUtility "Result"s into convenient objects, via a
handle that usually provides an iterator, in the spirit of Bio::SeqIO. These
are probably more useful than the "Result" to the typical user, and
so you can retrieve them automatically by setting the "run()"
parameter "-auto_adapt =" 1>.
In general, retrieve an adaptor like so:
$adp = $fac->$util->run( -auto_adapt => 1 );
# iterate...
while ( my $obj = $adp->next_obj ) {
# do stuff with $obj
}
The adaptor itself occasionally possesses useful methods besides the iterator.
The method "next_obj" always works, but a natural alias is also
always available:
$seqio = $fac->esearch->run( -auto_adapt => 1 );
while ( my $seq = $seqio->next_seq ) {
# do stuff with $seq
}
In the above example, "-auto_adapt =" 1> also instructs the factory
to perform an "efetch" based on the ids returned by the
"esearch" (if any), so that the adaptor returned iterates over
Bio::SeqI objects.
Here is a rundown of the different adaptor flavors:
- •
- "efetch", Fetch Adaptors, and BioPerl object
iterators
The "FetchAdaptor" creates bona fide BioPerl objects. Currently,
there are FetchAdaptor subclasses for sequence data (both Genbank and
FASTA rettypes) and taxonomy data. The choice of FetchAdaptor is based on
information in the result message, and should be transparent to the user.
$seqio = $fac->efetch( -db =>'nucleotide',
-id => \@ids,
-rettype => 'gb' )->run( -auto_adapt => 1 );
while (my $seq = $seqio->next_seq) {
my $taxio = $fac->efetch(
-db => 'taxonomy',
-id => $seq->species->ncbi_taxid )->run(-auto_adapt => 1);
my $tax = $taxio->next_species;
unless ( $tax->TaxId == $seq->species->ncbi_taxid ) {
print "more work for MAJ"
}
}
See the pod for the FetchAdaptor subclasses (e.g.,
Bio::DB::SoapEUtilities::FetchAdaptor::seq) for more detail.
- •
- "elink", the Link adaptor, and the
"linkset" iterator
The "LinkAdaptor" manages LinkSets. In "SoapEU", an
"elink" call always preserves the correspondence between
submitted and retrieved ids. The mapping between these can be accessed
from the adaptor object directly as "id_map()"
my $links = $fac->elink( -db => 'protein',
-dbfrom => 'nucleotide',
-id => \@nucids )->run( -auto_adapt => 1 );
# maybe more than one associated id...
my @prot_0 = $links->id_map( $nucids[0] );
Or iterate over the linksets:
while ( my $ls = $links->next_linkset ) {
@ids = $ls->ids;
@submitted_ids = $ls->submitted_ids;
# etc.
}
- •
- "esummary", the DocSum adaptor, and the
"docsum" iterator
The "DocSumAdaptor" manages docsums, the "esummary"
return type. The objects returned by iterating with a
"DocSumAdaptor" have accessors that let you obtain field
information directly. Docsums contain lots of easy-to-forget fields; use
"item_names()" to remind yourself.
my $docs = $fac->esummary( -db => 'taxonomy',
-id => 527031 )->run(-auto_adapt=>1);
# iterate over docsums
while (my $d = $docs->next_docsum) {
@available_items = $docsum->item_names;
# any available item can be called as an accessor
# from the docsum object...watch your case...
$sci_name = $d->ScientificName;
$taxid = $d->TaxId;
}
- •
- "egquery", the GQuery adaptor, and the
"query" iterator
The "GQueryAdaptor" manages global query items returned by calls
to "egquery", which identifies all NCBI databases containing
hits for your query term. The databases actually containing hits can be
retrieved directly from the adaptor with "found_in_dbs":
my $queries = $fac->egquery(
-term => 'BRCA and human'
)->run(-auto_adapt=>1);
my @dbs = $queries->found_in_dbs;
Retrieve the global query info returned for any database with
"query_by_db":
my $prot_q = $queries->query_by_db('protein');
if ($prot_q->count) {
#do something
}
Or iterate as usual:
while ( my $q = $queries->next_query ) {
if ($q->status eq 'Ok') {
# do sth
}
}
To make large or complex requests for data, or to share queries, it may be
helpful to use the NCBI WebEnv system to manage your queries. Each EUtility
accepts the following parameters:
-usehistory
-WebEnv
-QueryKey
for this purpose. These store the details of your queries serverside.
"SoapEU" attempts to make using these relatively straightforward. Use
"Result" objects to obtain the correct parameters, and don't forget
"-usehistory":
my $result1 = $fac->esearch(
-term => 'BRCA and human',
-db => 'nucleotide',
-usehistory => 1 )->run( -no_parse=>1 );
my $result = $fac->esearch(
-term => 'AND early onset',
-QueryKey => $result1->query_key,
-WebEnv => $result1->webenv )->run( -no_parse => 1 );
my $result = $fac->esearch(
-db => 'protein',
-term => 'sonic',
-usehistory => 1 )->run( -no_parse => 1 );
# later (but not more than 8 hours later) that day...
$result = $fac->esearch(
-WebEnv => $result->webenv,
-QueryKey => $result->query_key,
-RetMax => 800 # get 'em all
)->run; # note we're parsing the result...
@all_ids = $result->ids;
Two kinds of errors can ensue on an Entrez SOAP run. One is a SOAP fault, and
the other is an error sent in non-faulted SOAP message from the server. The
distinction is probably systematic, and I would welcome an explanation of it.
To check for result errors, try something like:
unless ( $result = $fac->$util->run ) {
die $fac->errstr; # this will catch a SOAP fault
}
# a valid result object was returned, but it may carry an error
if ($result->count == 0) {
warn "No hits returned";
if ($result->ERROR) {
warn "Entrez error : ".$result->ERROR;
}
}
Error handling will be improved in the package eventually.
Bio::DB::EUtilities, Bio::DB::SoapEUtilities::Result, Bio::DB::ESoap.
User feedback is an integral part of the evolution of this and other Bioperl
modules. Send your comments and suggestions preferably to the Bioperl mailing
list. Your participation is much appreciated.
[email protected] - General discussion
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
Please direct usage questions or support issues to the mailing list:
[email protected]
rather than to the module maintainer directly. Many experienced and reponsive
experts will be able look at the problem and quickly address it. Please
include a thorough description of the problem with code and data examples if
at all possible.
Report bugs to the Bioperl bug tracking system to help us keep track of the bugs
and their resolution. Bug reports can be submitted via the web:
http://redmine.open-bio.org/projects/bioperl/
Email maj -at- fortinbras -dot- us
The rest of the documentation details each of the object methods. Internal
methods are usually preceded with a _
Title : new
Usage : my $eutil = new Bio::DB::SoapEUtilities();
Function: Builds a new Bio::DB::SoapEUtilities object
Returns : an instance of Bio::DB::SoapEUtilities
Args :
Title : run
Usage : $fac->$eutility->run(@args)
Function: Execute the EUtility
Returns : true on success, false on fault or error
(reason in errstr(), for more detail check the SOAP message
in last_result() )
Args : named params appropriate to utility
-auto_adapt => boolean ( return an iterator over results as
appropriate to util if true)
-raw_xml => boolean ( return raw xml result; no processing )
Bio::DB::SoapEUtilities::Result constructor parms
Title : response_message
Aliases : last_response, last_result
Usage : $som = $fac->response_message
Function: get the last response message
Returns : a SOAP::SOM object
Args : none
Title : webenv
Usage :
Function: contains WebEnv key referencing the session
(set after run() )
Returns : scalar
Args : none
Title : errstr
Usage : $fac->errstr
Function: get the last error, if any
Example :
Returns : value of errstr (a scalar)
Args : none
Title : available_parameters
Usage :
Function: get available request parameters for calling
utility
Returns :
Args : -util => $desired_utility [optional, default is
caller utility]
Title : set_parameters
Usage :
Function:
Returns : none
Args : -util => $desired_utility [optional, default is
caller utility],
named utility arguments
Title : get_parameters
Usage :
Function:
Returns : array of named parameters
Args : utility (scalar string) [optional]
(default is caller utility)
Title : reset_parameters
Usage :
Function:
Returns : none
Args : -util => $desired_utility [optional, default is
caller utility],
named utility arguments
Title : parameters_changed
Usage :
Function:
Returns : boolean
Args : utility (scalar string) [optional]
(default is caller utility)
Title : _soap_facs
Usage : $self->_soap_facs($util, $fac)
Function: caches Bio::DB::ESoap factories for the
eutils in use by this instance
Example :
Returns : Bio::DB::ESoap object
Args : $eutility, [optional on set] $esoap_factory_object
Title : _caller_util
Usage : $self->_caller_util($newval)
Function: the utility requested off the main SoapEUtilities
object
Example :
Returns : value of _caller_util (a scalar string, a valid eutility)
Args : on set, new value (a scalar string [optional])