class InterMine::PathQuery::Query

A class representing a structured query against an InterMine Data-Warehouse

Queries represent structured requests for data from an InterMine data-warehouse. They consist basically of output columns you select, and a set of constraints on the results to return. These are known as the “view” and the “constraints”. In a nod to the SQL-origins of the queries, and to the syntax of ActiveRecord, there is both a method-chaining SQL-ish DSL, and a more isolating common InterMine DSL.

query = service.query("Gene").select("*").where("proteins.molecularWeight" => {">" => 10000})
query.each_result do |gene|
  puts gene.symbol
end

OR:

query = service.query("Gene")
query.add_views("*")
query.add_constraint("proteins.molecularWeight", ">", 10000)
...

The main differences from SQL are that the joining between tables is implicit and automatic. Simply by naming the column “Gene.proteins.molecularWeight” we have access to the protein table joined onto the gene table. (A consequence of this is that all queries must have a unique root that all paths descend from, and we do not permit right outer joins.)

You can define the following features of a query:

* The output column
* The filtering constraints (what values certain columns must or must not have)
* The sort order of the results
* The way constraints are combined (AND or OR)

In processing results, there are two powerful result formats available, depending on whether you want to process results row by row, or whether you would like the information grouped into logically coherent records. The latter is more similar to the ORM model, and can be seen above. The mechanisms we offer for row access allow accessing cell values of the result table transparently by index or column-name.

author

Alex Kalderimis dev@intermine.org

homepage

www.intermine.org

Licence

Copyright (C) 2002-2011 FlyMine

This code may be freely distributed and modified under the terms of the GNU Lesser General Public Licence. This should be distributed with the code. See the LICENSE file for more information or www.gnu.org/copyleft/lesser.html.

Constants

HIGHEST_CODE

The last possible constraint code

LOWEST_CODE

The first possible constraint code

Attributes

constraints[R]

All the current constraints on the query

joins[R]

All the current Join objects on the query

list_append_uri[R]

URLs for internal consumption.

list_upload_uri[R]

URLs for internal consumption.

logic[R]

The current logic (as a LogicGroup)

model[R]

The data model associated with the query

name[RW]

The (optional) name of the query. Used in automatic access (eg: “query1”)

root[RW]

The root class of the query.

service[R]

The service this query is associated with

size[RW]

The number of rows to return - defaults to nil (all rows)

sort_order[R]

The current sort-order.

start[RW]

The index of the first row to return - defaults to 0 (first row)

title[RW]

A human readable title of the query (eg: “Gene –> Protein Domain”)

views[R]

All the columns currently selected for output.

Public Class Methods

is_valid_code(str) click to toggle source

Whether or not the argument is a valid constraint code.

to be valid, it must be a one character string between A and Z inclusive.

    # File lib/intermine/query.rb
904 def self.is_valid_code(str)
905     return (str.length == 1) && (str >= LOWEST_CODE) && (str <= HIGHEST_CODE)
906 end
new(model, root=nil, service=nil) click to toggle source

Construct a new query object. You should not use this directly. Instead use the factory methods in Service.

query = service.query("Gene")
    # File lib/intermine/query.rb
290 def initialize(model, root=nil, service=nil)
291     @model = model
292     @service = service
293     @url = (@service.nil?) ? nil : @service.root + Service::QUERY_RESULTS_PATH
294     @list_upload_uri = (@service.nil?) ? nil : @service.root + Service::QUERY_TO_LIST_PATH
295     @list_append_uri = (@service.nil?) ? nil : @service.root + Service::QUERY_APPEND_PATH
296     if root
297         @root = InterMine::Metadata::Path.new(root, model).rootClass
298     end
299     @size = nil
300     @start = 0
301     @constraints = []
302     @joins = []
303     @views = []
304     @sort_order = []
305     @used_codes = []
306     @logic_parser = LogicParser.new(self)
307     @constraint_factory = ConstraintFactory.new(self)
308 end
parser(model) click to toggle source

Return a parser for deserialising queries.

parser = Query.parser(service.model)
query = parser.parse(string)
query.each_row |r|
  puts r.to_h
end
    # File lib/intermine/query.rb
318 def self.parser(model)
319     return QueryLoader.new(model)
320 end

Public Instance Methods

add_constraint(*parameters) click to toggle source

Add a constraint to the query matching the given parameters, and return the created constraint.

con = query.add_constraint("length", ">", 500)

Note that (at least for now) the style of argument used by where and add_constraint is not compatible. This is on the TODO list.

    # File lib/intermine/query.rb
761 def add_constraint(*parameters)
762     con = @constraint_factory.make_constraint(parameters)
763     @constraints << con
764     return con
765 end
add_join(path, style="OUTER") click to toggle source

Declare how a particular join should be treated.

The default join style is for an INNER join, but joins can optionally be declared to be LEFT OUTER joins. The difference is that with an inner join, each join in the query implicitly constrains the values of that path to be non-null, whereas an outer-join allows null values in the joined path. If the path passed to the constructor has a chain of joins, the last section is the one the join is applied to.

query = service.query("Gene")
# Allow genes without proteins
query.add_join("proteins") 
# Demand the results contain only those genes that have interactions that have interactingGenes,
# but allow those interactingGenes to not have any proteins.
query.add_join("interactions.interactingGenes.proteins")

The valid join styles are OUTER and INNER (case-insensitive). There is never any need to declare a join to be INNER, as it is inner by default. Consider using Query#outerjoin which is more explicitly declarative.

    # File lib/intermine/query.rb
701 def add_join(path, style="OUTER")
702     p = InterMine::Metadata::Path.new(add_prefix(path), @model, subclasses)
703     if @root.nil?
704         @root = p.rootClass
705     end
706     @joins << Join.new(p, style)
707     return self
708 end
Also aliased as: join
add_prefix(x) click to toggle source

Adds the root prefix to the given string.

Arguments:

x

An object with a to_s method

Returns the prefixed string.

    # File lib/intermine/query.rb
914 def add_prefix(x)
915     x = x.to_s
916     if @root && !x.start_with?(@root.name)
917         return @root.name + "." + x
918     else 
919         return x
920     end
921 end
add_sort_order(path, direction="ASC") click to toggle source

Add a sort order element to sort order information. A sort order consists of the name of an output column and (optionally) the direction to sort in. The default direction is “ASC”. The valid directions are “ASC” and “DESC” (case-insensitive).

query.add_sort_order("length")
query.add_sort_order("proteins.primaryIdentifier", "desc")
    # File lib/intermine/query.rb
725 def add_sort_order(path, direction="ASC") 
726     p = self.path(path)
727     if !@views.include? p
728         raise ArgumentError, "Sort order (#{p}) not in view (#{@views.map {|v| v.to_s}.inspect} in #{self.name || 'unnamed query'})"
729     end
730     @sort_order << SortOrder.new(p, direction)
731     return self
732 end
Also aliased as: order_by, order
add_to_select(*views)
Alias for: add_views
add_views(*views) click to toggle source

Add the given views (output columns) to the query.

Any columns ending in “*” will be interpreted as a request to add all attribute columns from that table to the query

Any columns that name a class or reference will add the id of that object to the query. This is helpful for creating lists and other specialist services.

query = service.query("Gene")
query.add_views("*")
query.add_to_select("*")
query.add_views("proteins.*")
query.add_views("pathways.*", "organism.shortName")
query.add_views("proteins", "exons")
    # File lib/intermine/query.rb
617 def add_views(*views)
618     views.flatten.map do |x| 
619         y = add_prefix(x)
620         if y.end_with?("*")
621             prefix = y.chomp(".*")
622             path = make_path(prefix)
623             add_views(path.end_cd.attributes.map {|x| prefix + "." + x.name})
624         else
625             path = make_path(y)
626             path = make_path(y.to_s + ".id") unless path.is_attribute?
627             if @root.nil?
628                 @root = path.rootClass
629             end
630             @views << path
631         end
632     end
633     return self
634 end
Also aliased as: add_to_select
all() click to toggle source

Return all result record objects returned by running this query.

    # File lib/intermine/query.rb
551 def all
552     return self.results
553 end
all_rows() click to toggle source

Return all the rows returned by running the query

    # File lib/intermine/query.rb
556 def all_rows
557     return self.rows
558 end
coded_constraints() click to toggle source

Return all the constraints that have codes and can thus participate in logic.

    # File lib/intermine/query.rb
324 def coded_constraints
325     return @constraints.select {|x| !x.is_a?(SubClassConstraint)}
326 end
constraintLogic=(value)
Alias for: set_logic
count() click to toggle source

Return the number of result rows this query will return in its current state. This makes a very small request to the webservice, and is the most efficient method of getting the size of the result set.

    # File lib/intermine/query.rb
467 def count
468     return results_reader.get_size
469 end
each_result(start=nil, size=nil) { |row| ... } click to toggle source

Iterate over the results, one record at a time.

query.each_result do |gene|
  puts gene.symbol
  gene.proteins.each do |prot|
    puts prot.primaryIdentifier
  end
end

This method is now deprecated and will be removed in version 1 Please use Query#results

    # File lib/intermine/query.rb
456 def each_result(start=nil, size=nil)
457     start = start.nil? ? @start : start
458     size  = size.nil? ? @size : size
459     results_reader(start, size).each_result {|row|
460         yield row
461     }
462 end
each_row(start=nil, size=nil) { |row| ... } click to toggle source

Iterate over the results of this query one row at a time.

Rows support both array-like index based access as well as hash-like key based access. For key based acces you can use either the full path or the headless short version:

query.each_row do |row|
  puts r["Gene.symbol"], r["proteins.primaryIdentifier"]
  puts r[0]
  puts r.to_a # Materialize the row an an Array
  puts r.to_h # Materialize the row an a Hash
end

This method is now deprecated and will be removed in version 1 Please use Query#rows

    # File lib/intermine/query.rb
436 def each_row(start=nil, size=nil)
437     start = start.nil? ? @start : start
438     size  = size.nil? ? @size : size
439     results_reader(start, size).each_row {|row|
440         yield row
441     }
442 end
eql?(other) click to toggle source

Return true if the other query has exactly the same configuration, and belongs to the same service.

    # File lib/intermine/query.rb
370 def eql?(other)
371     if other.is_a? Query
372         return self.service == other.service && self.to_xml_to_s == other.to_xml.to_s
373     else
374         return false
375     end
376 end
first(start=0) click to toggle source

Get the first result record from the query, starting at the given offset. If the offset is large, then this is not an efficient way to retrieve this data, and you may with to consider a looping approach or row based access instead.

    # File lib/intermine/query.rb
564 def first(start=0)
565     current_row = 0
566     # Have to iterate as start refers to row count
567     results_reader.each_result { |r|
568         if current_row == start
569             return r
570         end
571         current_row += 1
572     }
573     return nil
574 end
first_row(start = 0) click to toggle source

Get the first row of results from the query, starting at the given offset.

    # File lib/intermine/query.rb
577 def first_row(start = 0)
578     return self.results(start, 1).first
579 end
get_constraint(code) click to toggle source

Get the constraint on the query with the given code. Raises an error if there is no such constraint.

    # File lib/intermine/query.rb
583 def get_constraint(code)
584     @constraints.each do |x|
585         if x.respond_to?(:code) and x.code == code
586             return x
587         end
588     end
589     raise ArgumentError, "#{code} not in query"
590 end
inspect() click to toggle source

Return an informative textual representation of the query.

    # File lib/intermine/query.rb
938 def inspect
939     return "<#{self.class.name} query=#{self.to_s.inspect}>"
940 end
join(path, style="OUTER")
Alias for: add_join
limit(size) click to toggle source

Set the maximum number of rows this query will return.

This method can be used to set a default maximum size for a query. Set to nil for all rows. The value given here will be overridden by any value supplied by each_row or each_result, unless that value is nil, in which case this value will be used. If unset, the query will return all results.

Returns self for method chaining.

See also size= and offset

    # File lib/intermine/query.rb
400 def limit(size)
401     @size = size
402     return self
403 end
make_path(path) click to toggle source
    # File lib/intermine/query.rb
636 def make_path(path)
637     return InterMine::Metadata::Path.new(path, @model, subclasses)
638 end
next_code() click to toggle source

Get the next available code for the query.

    # File lib/intermine/query.rb
883 def next_code
884     c = LOWEST_CODE
885     while Query.is_valid_code(c)
886         return c unless used_codes.include?(c)
887         c = c.next
888     end
889     raise RuntimeError, "Maximum number of codes reached - all 26 have been allocated"
890 end
offset(start) click to toggle source

Set the index of the first row of results this query will return.

This method can be used to set a value for the query offset. The value given here will be overridden by any value supplied by each_row or each_result. If unset, results will start from the first row.

Returns self for method chaining.

See also start= and limit

    # File lib/intermine/query.rb
415 def offset(start) 
416     @start = start
417     return self
418 end
order(path, direction="ASC")
Alias for: add_sort_order
order_by(path, direction="ASC")
Alias for: add_sort_order
outerjoin(path) click to toggle source

Explicitly declare a join to be an outer join.

    # File lib/intermine/query.rb
713 def outerjoin(path)
714     return add_join(path)
715 end
params() click to toggle source

Return the parameter hash for running this query in its current state.

    # File lib/intermine/query.rb
924 def params
925     hash = {"query" => self.to_xml}
926     if @service and @service.token
927         hash["token"] = @service.token
928     end
929     return hash
930 end
path(pathstr) click to toggle source

Returns a Path object constructed from the given path-string, taking the current state of the query into account (its data-model and subclass constraints).

    # File lib/intermine/query.rb
770 def path(pathstr)
771     return InterMine::Metadata::Path.new(add_prefix(pathstr), @model, subclasses)
772 end
remove_constraint(code) click to toggle source

Remove the constraint with the given code from the query. If no such constraint exists, no error will be raised.

    # File lib/intermine/query.rb
595 def remove_constraint(code)
596     @constraints.reject! do |x|
597         x.respond_to?(:code) and x.code == code
598     end
599 end
results(start=nil, size=nil) click to toggle source

Return objects corresponding to the type of data requested, starting at the given row offset. Returns an Enumerable of InterMineObject, where each value is read one at a time from the connection.

genes = query.results
genes.last.symbol
=> "eve"
    # File lib/intermine/query.rb
497 def results(start=nil, size=nil)
498     start = start.nil? ? @start : start
499     size  = size.nil? ? @size : size
500     return Results::ObjectReader.new(@url, self, start, size)
501 end
results_reader(start=0, size=nil) click to toggle source

Get your own result reader for handling the results at a low level. If no columns have been selected for output before requesting results, all attribute columns will be selected.

    # File lib/intermine/query.rb
381 def results_reader(start=0, size=nil)
382     if @views.empty?
383         select("*")
384     end
385     return Results::ResultsReader.new(@url, self, start, size)
386 end
rows(start=nil, size=nil) click to toggle source

Returns an Enumerable of ResultRow objects containing the data returned by running this query, starting at the given offset and containing up to the given maximum size.

The webservice enforces a maximum page-size of 10,000,000 rows, independent of any size you specify - this can be obviated with paging for large result sets.

rows = query.rows
rows.last["symbol"]
=> "eve"
    # File lib/intermine/query.rb
483 def rows(start=nil, size=nil)
484     start = start.nil? ? @start : start
485     size  = size.nil? ? @size : size
486     return Results::RowReader.new(@url, self, start, size)
487 end
select(*view)
Alias for: view=
sequences(range) click to toggle source
    # File lib/intermine/query.rb
546 def sequences(range)
547     return Results::SeqReader.new(@service.root, clone, range)
548 end
set_logic(value) click to toggle source

Set the logic to the given value.

The value will be parsed for consistency is it is a logic string.

Returns self to support chaining.

    # File lib/intermine/query.rb
871 def set_logic(value)
872     if value.is_a?(LogicGroup)
873         @logic = value
874     else
875         @logic = @logic_parser.parse_logic(value)
876     end
877     return self
878 end
Also aliased as: constraintLogic=
sortOrder=(so) click to toggle source

Set the sort order completely, replacing the current sort order.

query.sortOrder = "Gene.length asc Gene.proteins.length desc"

The sort order expression will be parsed and checked for conformity with the current state of the query.

    # File lib/intermine/query.rb
740 def sortOrder=(so)
741     if so.is_a?(Array)
742         sos = so
743     else
744         sos = so.split(/(ASC|DESC|asc|desc)/).map {|x| x.strip}.every(2)
745     end
746     sos.each do |args|
747         add_sort_order(*args)
748     end
749 end
subclass_constraints() click to toggle source

Return all the constraints that restrict the class of paths in the query.

    # File lib/intermine/query.rb
330 def subclass_constraints
331     return @constraints.select {|x| x.is_a?(SubClassConstraint)}
332 end
subclasses() click to toggle source

Get the current sub-class map for this query.

This contains information about which fields of this query have been declared to be restricted to contain only a subclass of their normal type.

> query = service.query("Gene")
> query.where(:microArrayResults => service.model.table("FlyAtlasResult"))
> query.subclasses
=> {"Gene.microArrayResults" => "FlyAtlasResult"}
    # File lib/intermine/query.rb
671 def subclasses
672     subclasses = {}
673     @constraints.each do |con|
674         if con.is_a?(SubClassConstraint)
675             subclasses[con.path.to_s] = con.sub_class.to_s
676         end
677     end
678     return subclasses
679 end
summaries(path, start=0, size=nil) click to toggle source

Return an Enumerable of summary items starting at the given offset.

summary = query.summary_items("chromosome.primaryIdentifier")
top_chromosome = summary[0]["item"]
no_in_top_chrom = summary[0]["count"]

This can be made more efficient by passing in a size - ie, if you only want the top item, pass in an offset of 0 and a size of 1 and only that row will be fetched.

    # File lib/intermine/query.rb
513 def summaries(path, start=0, size=nil)
514     q = self.clone
515     q.add_views(path)
516     return Results::SummaryReader.new(@url, q, start, size, path)
517 end
summarise(path, start=0, size=nil) click to toggle source

Return a summary for a column as a Hash

For numeric values the hash has four keys: “average”, “stdev”, “min”, and “max”.

summary = query.summarise("length")
puts summary["average"]

For non-numeric values, the hash will have each possible value as a key, and the count of the occurrences of that value in this query's result set as the corresponding value:

summary = query.summarise("chromosome.primaryIdentifier")
puts summary["2L"]

To limit the size of the result set you can use start and size as per normal queries - this has no real effect with numeric queries, which always return the same information.

    # File lib/intermine/query.rb
537 def summarise(path, start=0, size=nil)
538     t = make_path(add_prefix(path)).end_type
539     if InterMine::Metadata::Model::NUMERIC_TYPES.include? t
540         return Hash[summaries(path, start, size).first.map {|k, v| [k, v.to_f]}]
541     else
542         return Hash[summaries(path, start, size).map {|sum| [sum["item"], sum["count"]]}]
543     end
544 end
to_s() click to toggle source

Return the textual representation of the query. Here it returns the Query XML

    # File lib/intermine/query.rb
933 def to_s
934     return to_xml.to_s
935 end
to_xml() click to toggle source

Return an XML document node representing the XML form of the query.

This is the canonical serialisable form of the query.

    # File lib/intermine/query.rb
338 def to_xml
339     doc = REXML::Document.new
340 
341     if @sort_order.empty?
342         so = SortOrder.new(@views.first, "ASC")
343     else
344         so = @sort_order.join(" ")
345     end
346 
347     query = doc.add_element("query", {
348         "name" => @name, 
349         "model" => @model.name, 
350         "title" => @title, 
351         "sortOrder" => so,
352         "view" => @views.join(" "),
353         "constraintLogic" => @logic
354     }.delete_if { |k, v | !v })
355     @joins.each { |join| 
356         query.add_element("join", join.attrs) 
357     }
358     subclass_constraints.each { |con|
359         query.add_element(con.to_elem) 
360     }
361     coded_constraints.each { |con|
362         query.add_element(con.to_elem) 
363     }
364     return doc
365 end
used_codes() click to toggle source

Return the list of currently used codes by the query.

    # File lib/intermine/query.rb
893 def used_codes
894     if @constraints.empty?
895         return []
896     else
897         return @constraints.select {|x| !x.is_a?(SubClassConstraint)}.map {|x| x.code}
898     end
899 end
view=(*view) click to toggle source

Replace any currently existing views with the given view list. If the view is not already an Array, it will be split by commas and whitespace.

    # File lib/intermine/query.rb
645 def view=(*view)
646     @views = []
647     view.each do |v|
648         if v.is_a?(Array)
649             views = v
650         else
651             views = v.to_s.split(/(?:,\s*|\s+)/)
652         end
653         add_views(*views)
654     end
655     return self
656 end
Also aliased as: select
where(*wheres) click to toggle source

Add a constraint clause to the query.

query.where(:symbol => "eve")
query.where(:symbol => %{eve h bib zen})
query.where(:length => {:le => 100}, :symbol => "eve*")

Interprets the arguments in a style similar to that of ActiveRecord constraints, and adds them to the query. If multiple constraints are supplied in a single hash (as in the third example), then the order in which they are applied to the query (and thus the codes they will receive) is not predictable. To determine the order use chained where clauses or use multiple hashes:

query.where({:length => {:le => 100}}, {:symbol => "eve*"})

Returns self to support method chaining

    # File lib/intermine/query.rb
792 def where(*wheres)
793    if @views.empty?
794        self.select('*')
795    end
796    wheres.each do |w|
797      w.each do |k,v|
798         if v.is_a?(Hash)
799             parameters = {:path => k}
800             v.each do |subk, subv|
801                 normalised_k = subk.to_s.upcase.gsub(/_/, " ")
802                 if subk == :with
803                     parameters[:extra_value] = subv
804                 elsif subk == :sub_class
805                     parameters[subk] = subv
806                 elsif subk == :code
807                     parameters[:code] = subv
808                 elsif LoopConstraint.valid_ops.include?(normalised_k)
809                     parameters[:op] = normalised_k
810                     parameters[:loopPath] = subv
811                 else
812                     if subv.nil?
813                         if subk == "="
814                             parameters[:op] = "IS NULL"
815                         elsif subk == "!="
816                             parameters[:op] = "IS NOT NULL"
817                         else
818                             parameters[:op] = normalised_k
819                         end
820                     elsif subv.is_a?(Range) or subv.is_a?(Array)
821                         if subk == "="
822                             parameters[:op] = "ONE OF"
823                         elsif subk == "!="
824                             parameters[:op] = "NONE OF"
825                         else
826                             parameters[:op] = normalised_k
827                         end
828                         parameters[:values] = subv.to_a
829                     elsif subv.is_a?(Lists::List)
830                         if subk == "="
831                             parameters[:op] = "IN"
832                         elsif subk == "!="
833                             parameters[:op] = "NOT IN"
834                         else
835                             parameters[:op] = normalised_k
836                         end
837                         parameters[:value] = subv.name
838                     else
839                         parameters[:op] = normalised_k
840                         parameters[:value] = subv
841                     end
842                 end
843             end
844             add_constraint(parameters)
845         elsif v.is_a?(Range) or v.is_a?(Array)
846             add_constraint(k.to_s, 'ONE OF', v.to_a)
847         elsif v.is_a?(InterMine::Metadata::ClassDescriptor)
848             add_constraint(:path => k.to_s, :sub_class => v.name)
849         elsif v.is_a?(InterMine::Lists::List)
850             add_constraint(k.to_s, 'IN', v.name)
851         elsif v.nil?
852             add_constraint(k.to_s, "IS NULL")
853         else
854             if path(k.to_s).is_attribute?
855                 add_constraint(k.to_s, '=', v)
856             else
857                 add_constraint(k.to_s, 'LOOKUP', v)
858             end
859         end
860      end
861    end
862    return self
863 end