class InterMine::PathQuery::Query
A class representing a structured query against an InterMine
Data-Warehouse¶ ↑
Queries represent structured requests for data from an InterMine
data-warehouse. They consist basically of output columns you select, and a set of constraints on the results to return. These are known as the “view” and the “constraints”. In a nod to the SQL-origins of the queries, and to the syntax of ActiveRecord, there is both a method-chaining SQL-ish DSL, and a more isolating common InterMine
DSL.
query = service.query("Gene").select("*").where("proteins.molecularWeight" => {">" => 10000}) query.each_result do |gene| puts gene.symbol end
OR:
query = service.query("Gene") query.add_views("*") query.add_constraint("proteins.molecularWeight", ">", 10000) ...
The main differences from SQL are that the joining between tables is implicit and automatic. Simply by naming the column “Gene.proteins.molecularWeight” we have access to the protein table joined onto the gene table. (A consequence of this is that all queries must have a unique root that all paths descend from, and we do not permit right outer joins.)
You can define the following features of a query:
* The output column * The filtering constraints (what values certain columns must or must not have) * The sort order of the results * The way constraints are combined (AND or OR)
In processing results, there are two powerful result formats available, depending on whether you want to process results row by row, or whether you would like the information grouped into logically coherent records. The latter is more similar to the ORM model, and can be seen above. The mechanisms we offer for row access allow accessing cell values of the result table transparently by index or column-name.
- author
-
Alex Kalderimis dev@intermine.org
- homepage
- Licence
-
Copyright (C) 2002-2011 FlyMine
This code may be freely distributed and modified under the terms of the GNU Lesser General Public Licence. This should be distributed with the code. See the LICENSE file for more information or www.gnu.org/copyleft/lesser.html.
Constants
- HIGHEST_CODE
The last possible constraint code
- LOWEST_CODE
The first possible constraint code
Attributes
All the current constraints on the query
All the current Join
objects on the query
URLs for internal consumption.
URLs for internal consumption.
The current logic (as a LogicGroup
)
The data model associated with the query
The (optional) name of the query. Used in automatic access (eg: “query1”)
The root class of the query.
The service this query is associated with
The number of rows to return - defaults to nil (all rows)
The current sort-order.
The index of the first row to return - defaults to 0 (first row)
A human readable title of the query (eg: “Gene –> Protein Domain”)
All the columns currently selected for output.
Public Class Methods
Whether or not the argument is a valid constraint code.
to be valid, it must be a one character string between A and Z inclusive.
# File lib/intermine/query.rb 904 def self.is_valid_code(str) 905 return (str.length == 1) && (str >= LOWEST_CODE) && (str <= HIGHEST_CODE) 906 end
Construct a new query object. You should not use this directly. Instead use the factory methods in Service
.
query = service.query("Gene")
# File lib/intermine/query.rb 290 def initialize(model, root=nil, service=nil) 291 @model = model 292 @service = service 293 @url = (@service.nil?) ? nil : @service.root + Service::QUERY_RESULTS_PATH 294 @list_upload_uri = (@service.nil?) ? nil : @service.root + Service::QUERY_TO_LIST_PATH 295 @list_append_uri = (@service.nil?) ? nil : @service.root + Service::QUERY_APPEND_PATH 296 if root 297 @root = InterMine::Metadata::Path.new(root, model).rootClass 298 end 299 @size = nil 300 @start = 0 301 @constraints = [] 302 @joins = [] 303 @views = [] 304 @sort_order = [] 305 @used_codes = [] 306 @logic_parser = LogicParser.new(self) 307 @constraint_factory = ConstraintFactory.new(self) 308 end
Return a parser for deserialising queries.
parser = Query.parser(service.model) query = parser.parse(string) query.each_row |r| puts r.to_h end
# File lib/intermine/query.rb 318 def self.parser(model) 319 return QueryLoader.new(model) 320 end
Public Instance Methods
Add a constraint to the query matching the given parameters, and return the created constraint.
con = query.add_constraint("length", ">", 500)
Note that (at least for now) the style of argument used by where and add_constraint
is not compatible. This is on the TODO list.
# File lib/intermine/query.rb 761 def add_constraint(*parameters) 762 con = @constraint_factory.make_constraint(parameters) 763 @constraints << con 764 return con 765 end
Declare how a particular join should be treated.
The default join style is for an INNER join, but joins can optionally be declared to be LEFT OUTER joins. The difference is that with an inner join, each join in the query implicitly constrains the values of that path to be non-null, whereas an outer-join allows null values in the joined path. If the path passed to the constructor has a chain of joins, the last section is the one the join is applied to.
query = service.query("Gene") # Allow genes without proteins query.add_join("proteins") # Demand the results contain only those genes that have interactions that have interactingGenes, # but allow those interactingGenes to not have any proteins. query.add_join("interactions.interactingGenes.proteins")
The valid join styles are OUTER and INNER (case-insensitive). There is never any need to declare a join to be INNER, as it is inner by default. Consider using Query#outerjoin
which is more explicitly declarative.
# File lib/intermine/query.rb 701 def add_join(path, style="OUTER") 702 p = InterMine::Metadata::Path.new(add_prefix(path), @model, subclasses) 703 if @root.nil? 704 @root = p.rootClass 705 end 706 @joins << Join.new(p, style) 707 return self 708 end
Adds the root prefix to the given string.
Arguments:
x
-
An object with a
to_s
method
Returns the prefixed string.
# File lib/intermine/query.rb 914 def add_prefix(x) 915 x = x.to_s 916 if @root && !x.start_with?(@root.name) 917 return @root.name + "." + x 918 else 919 return x 920 end 921 end
Add a sort order element to sort order information. A sort order consists of the name of an output column and (optionally) the direction to sort in. The default direction is “ASC”. The valid directions are “ASC” and “DESC” (case-insensitive).
query.add_sort_order("length") query.add_sort_order("proteins.primaryIdentifier", "desc")
# File lib/intermine/query.rb 725 def add_sort_order(path, direction="ASC") 726 p = self.path(path) 727 if !@views.include? p 728 raise ArgumentError, "Sort order (#{p}) not in view (#{@views.map {|v| v.to_s}.inspect} in #{self.name || 'unnamed query'})" 729 end 730 @sort_order << SortOrder.new(p, direction) 731 return self 732 end
Add the given views (output columns) to the query.
Any columns ending in “*” will be interpreted as a request to add all attribute columns from that table to the query
Any columns that name a class or reference will add the id of that object to the query. This is helpful for creating lists and other specialist services.
query = service.query("Gene") query.add_views("*") query.add_to_select("*") query.add_views("proteins.*") query.add_views("pathways.*", "organism.shortName") query.add_views("proteins", "exons")
# File lib/intermine/query.rb 617 def add_views(*views) 618 views.flatten.map do |x| 619 y = add_prefix(x) 620 if y.end_with?("*") 621 prefix = y.chomp(".*") 622 path = make_path(prefix) 623 add_views(path.end_cd.attributes.map {|x| prefix + "." + x.name}) 624 else 625 path = make_path(y) 626 path = make_path(y.to_s + ".id") unless path.is_attribute? 627 if @root.nil? 628 @root = path.rootClass 629 end 630 @views << path 631 end 632 end 633 return self 634 end
Return all result record objects returned by running this query.
# File lib/intermine/query.rb 551 def all 552 return self.results 553 end
Return all the rows returned by running the query
# File lib/intermine/query.rb 556 def all_rows 557 return self.rows 558 end
Return all the constraints that have codes and can thus participate in logic.
# File lib/intermine/query.rb 324 def coded_constraints 325 return @constraints.select {|x| !x.is_a?(SubClassConstraint)} 326 end
Return the number of result rows this query will return in its current state. This makes a very small request to the webservice, and is the most efficient method of getting the size of the result set.
# File lib/intermine/query.rb 467 def count 468 return results_reader.get_size 469 end
Iterate over the results, one record at a time.
query.each_result do |gene| puts gene.symbol gene.proteins.each do |prot| puts prot.primaryIdentifier end end
This method is now deprecated and will be removed in version 1 Please use Query#results
# File lib/intermine/query.rb 456 def each_result(start=nil, size=nil) 457 start = start.nil? ? @start : start 458 size = size.nil? ? @size : size 459 results_reader(start, size).each_result {|row| 460 yield row 461 } 462 end
Iterate over the results of this query one row at a time.
Rows support both array-like index based access as well as hash-like key based access. For key based acces you can use either the full path or the headless short version:
query.each_row do |row| puts r["Gene.symbol"], r["proteins.primaryIdentifier"] puts r[0] puts r.to_a # Materialize the row an an Array puts r.to_h # Materialize the row an a Hash end
This method is now deprecated and will be removed in version 1 Please use Query#rows
# File lib/intermine/query.rb 436 def each_row(start=nil, size=nil) 437 start = start.nil? ? @start : start 438 size = size.nil? ? @size : size 439 results_reader(start, size).each_row {|row| 440 yield row 441 } 442 end
Return true if the other query has exactly the same configuration, and belongs to the same service.
# File lib/intermine/query.rb 370 def eql?(other) 371 if other.is_a? Query 372 return self.service == other.service && self.to_xml_to_s == other.to_xml.to_s 373 else 374 return false 375 end 376 end
Get the first result record from the query, starting at the given offset. If the offset is large, then this is not an efficient way to retrieve this data, and you may with to consider a looping approach or row based access instead.
# File lib/intermine/query.rb 564 def first(start=0) 565 current_row = 0 566 # Have to iterate as start refers to row count 567 results_reader.each_result { |r| 568 if current_row == start 569 return r 570 end 571 current_row += 1 572 } 573 return nil 574 end
Get the first row of results from the query, starting at the given offset.
# File lib/intermine/query.rb 577 def first_row(start = 0) 578 return self.results(start, 1).first 579 end
Get the constraint on the query with the given code. Raises an error if there is no such constraint.
# File lib/intermine/query.rb 583 def get_constraint(code) 584 @constraints.each do |x| 585 if x.respond_to?(:code) and x.code == code 586 return x 587 end 588 end 589 raise ArgumentError, "#{code} not in query" 590 end
Return an informative textual representation of the query.
# File lib/intermine/query.rb 938 def inspect 939 return "<#{self.class.name} query=#{self.to_s.inspect}>" 940 end
Set the maximum number of rows this query will return.
This method can be used to set a default maximum size for a query. Set to nil for all rows. The value given here will be overridden by any value supplied by each_row
or each_result
, unless that value is nil, in which case this value will be used. If unset, the query will return all results.
Returns self for method chaining.
# File lib/intermine/query.rb 400 def limit(size) 401 @size = size 402 return self 403 end
# File lib/intermine/query.rb 636 def make_path(path) 637 return InterMine::Metadata::Path.new(path, @model, subclasses) 638 end
Get the next available code for the query.
# File lib/intermine/query.rb 883 def next_code 884 c = LOWEST_CODE 885 while Query.is_valid_code(c) 886 return c unless used_codes.include?(c) 887 c = c.next 888 end 889 raise RuntimeError, "Maximum number of codes reached - all 26 have been allocated" 890 end
Set the index of the first row of results this query will return.
This method can be used to set a value for the query offset. The value given here will be overridden by any value supplied by each_row
or each_result
. If unset, results will start from the first row.
Returns self for method chaining.
# File lib/intermine/query.rb 415 def offset(start) 416 @start = start 417 return self 418 end
Explicitly declare a join to be an outer join.
# File lib/intermine/query.rb 713 def outerjoin(path) 714 return add_join(path) 715 end
Return the parameter hash for running this query in its current state.
# File lib/intermine/query.rb 924 def params 925 hash = {"query" => self.to_xml} 926 if @service and @service.token 927 hash["token"] = @service.token 928 end 929 return hash 930 end
Returns a Path object constructed from the given path-string, taking the current state of the query into account (its data-model and subclass constraints).
# File lib/intermine/query.rb 770 def path(pathstr) 771 return InterMine::Metadata::Path.new(add_prefix(pathstr), @model, subclasses) 772 end
Remove the constraint with the given code from the query. If no such constraint exists, no error will be raised.
# File lib/intermine/query.rb 595 def remove_constraint(code) 596 @constraints.reject! do |x| 597 x.respond_to?(:code) and x.code == code 598 end 599 end
Return objects corresponding to the type of data requested, starting at the given row offset. Returns an Enumerable of InterMineObject, where each value is read one at a time from the connection.
genes = query.results genes.last.symbol => "eve"
# File lib/intermine/query.rb 497 def results(start=nil, size=nil) 498 start = start.nil? ? @start : start 499 size = size.nil? ? @size : size 500 return Results::ObjectReader.new(@url, self, start, size) 501 end
Get your own result reader for handling the results at a low level. If no columns have been selected for output before requesting results, all attribute columns will be selected.
# File lib/intermine/query.rb 381 def results_reader(start=0, size=nil) 382 if @views.empty? 383 select("*") 384 end 385 return Results::ResultsReader.new(@url, self, start, size) 386 end
Returns an Enumerable of ResultRow objects containing the data returned by running this query, starting at the given offset and containing up to the given maximum size.
The webservice enforces a maximum page-size of 10,000,000 rows, independent of any size you specify - this can be obviated with paging for large result sets.
rows = query.rows rows.last["symbol"] => "eve"
# File lib/intermine/query.rb 483 def rows(start=nil, size=nil) 484 start = start.nil? ? @start : start 485 size = size.nil? ? @size : size 486 return Results::RowReader.new(@url, self, start, size) 487 end
# File lib/intermine/query.rb 546 def sequences(range) 547 return Results::SeqReader.new(@service.root, clone, range) 548 end
Set the logic to the given value.
The value will be parsed for consistency is it is a logic string.
Returns self to support chaining.
# File lib/intermine/query.rb 871 def set_logic(value) 872 if value.is_a?(LogicGroup) 873 @logic = value 874 else 875 @logic = @logic_parser.parse_logic(value) 876 end 877 return self 878 end
Set the sort order completely, replacing the current sort order.
query.sortOrder = "Gene.length asc Gene.proteins.length desc"
The sort order expression will be parsed and checked for conformity with the current state of the query.
# File lib/intermine/query.rb 740 def sortOrder=(so) 741 if so.is_a?(Array) 742 sos = so 743 else 744 sos = so.split(/(ASC|DESC|asc|desc)/).map {|x| x.strip}.every(2) 745 end 746 sos.each do |args| 747 add_sort_order(*args) 748 end 749 end
Return all the constraints that restrict the class of paths in the query.
# File lib/intermine/query.rb 330 def subclass_constraints 331 return @constraints.select {|x| x.is_a?(SubClassConstraint)} 332 end
Get the current sub-class map for this query.
This contains information about which fields of this query have been declared to be restricted to contain only a subclass of their normal type.
> query = service.query("Gene") > query.where(:microArrayResults => service.model.table("FlyAtlasResult")) > query.subclasses => {"Gene.microArrayResults" => "FlyAtlasResult"}
# File lib/intermine/query.rb 671 def subclasses 672 subclasses = {} 673 @constraints.each do |con| 674 if con.is_a?(SubClassConstraint) 675 subclasses[con.path.to_s] = con.sub_class.to_s 676 end 677 end 678 return subclasses 679 end
Return an Enumerable of summary items starting at the given offset.
summary = query.summary_items("chromosome.primaryIdentifier") top_chromosome = summary[0]["item"] no_in_top_chrom = summary[0]["count"]
This can be made more efficient by passing in a size - ie, if you only want the top item, pass in an offset of 0 and a size of 1 and only that row will be fetched.
# File lib/intermine/query.rb 513 def summaries(path, start=0, size=nil) 514 q = self.clone 515 q.add_views(path) 516 return Results::SummaryReader.new(@url, q, start, size, path) 517 end
Return a summary for a column as a Hash
For numeric values the hash has four keys: “average”, “stdev”, “min”, and “max”.
summary = query.summarise("length") puts summary["average"]
For non-numeric values, the hash will have each possible value as a key, and the count of the occurrences of that value in this query's result set as the corresponding value:
summary = query.summarise("chromosome.primaryIdentifier") puts summary["2L"]
To limit the size of the result set you can use start and size as per normal queries - this has no real effect with numeric queries, which always return the same information.
# File lib/intermine/query.rb 537 def summarise(path, start=0, size=nil) 538 t = make_path(add_prefix(path)).end_type 539 if InterMine::Metadata::Model::NUMERIC_TYPES.include? t 540 return Hash[summaries(path, start, size).first.map {|k, v| [k, v.to_f]}] 541 else 542 return Hash[summaries(path, start, size).map {|sum| [sum["item"], sum["count"]]}] 543 end 544 end
Return the textual representation of the query. Here it returns the Query
XML
# File lib/intermine/query.rb 933 def to_s 934 return to_xml.to_s 935 end
Return an XML document node representing the XML form of the query.
This is the canonical serialisable form of the query.
# File lib/intermine/query.rb 338 def to_xml 339 doc = REXML::Document.new 340 341 if @sort_order.empty? 342 so = SortOrder.new(@views.first, "ASC") 343 else 344 so = @sort_order.join(" ") 345 end 346 347 query = doc.add_element("query", { 348 "name" => @name, 349 "model" => @model.name, 350 "title" => @title, 351 "sortOrder" => so, 352 "view" => @views.join(" "), 353 "constraintLogic" => @logic 354 }.delete_if { |k, v | !v }) 355 @joins.each { |join| 356 query.add_element("join", join.attrs) 357 } 358 subclass_constraints.each { |con| 359 query.add_element(con.to_elem) 360 } 361 coded_constraints.each { |con| 362 query.add_element(con.to_elem) 363 } 364 return doc 365 end
Return the list of currently used codes by the query.
# File lib/intermine/query.rb 893 def used_codes 894 if @constraints.empty? 895 return [] 896 else 897 return @constraints.select {|x| !x.is_a?(SubClassConstraint)}.map {|x| x.code} 898 end 899 end
Replace any currently existing views with the given view list. If the view is not already an Array
, it will be split by commas and whitespace.
# File lib/intermine/query.rb 645 def view=(*view) 646 @views = [] 647 view.each do |v| 648 if v.is_a?(Array) 649 views = v 650 else 651 views = v.to_s.split(/(?:,\s*|\s+)/) 652 end 653 add_views(*views) 654 end 655 return self 656 end
Add a constraint clause to the query.
query.where(:symbol => "eve") query.where(:symbol => %{eve h bib zen}) query.where(:length => {:le => 100}, :symbol => "eve*")
Interprets the arguments in a style similar to that of ActiveRecord constraints, and adds them to the query. If multiple constraints are supplied in a single hash (as in the third example), then the order in which they are applied to the query (and thus the codes they will receive) is not predictable. To determine the order use chained where clauses or use multiple hashes:
query.where({:length => {:le => 100}}, {:symbol => "eve*"})
Returns self to support method chaining
# File lib/intermine/query.rb 792 def where(*wheres) 793 if @views.empty? 794 self.select('*') 795 end 796 wheres.each do |w| 797 w.each do |k,v| 798 if v.is_a?(Hash) 799 parameters = {:path => k} 800 v.each do |subk, subv| 801 normalised_k = subk.to_s.upcase.gsub(/_/, " ") 802 if subk == :with 803 parameters[:extra_value] = subv 804 elsif subk == :sub_class 805 parameters[subk] = subv 806 elsif subk == :code 807 parameters[:code] = subv 808 elsif LoopConstraint.valid_ops.include?(normalised_k) 809 parameters[:op] = normalised_k 810 parameters[:loopPath] = subv 811 else 812 if subv.nil? 813 if subk == "=" 814 parameters[:op] = "IS NULL" 815 elsif subk == "!=" 816 parameters[:op] = "IS NOT NULL" 817 else 818 parameters[:op] = normalised_k 819 end 820 elsif subv.is_a?(Range) or subv.is_a?(Array) 821 if subk == "=" 822 parameters[:op] = "ONE OF" 823 elsif subk == "!=" 824 parameters[:op] = "NONE OF" 825 else 826 parameters[:op] = normalised_k 827 end 828 parameters[:values] = subv.to_a 829 elsif subv.is_a?(Lists::List) 830 if subk == "=" 831 parameters[:op] = "IN" 832 elsif subk == "!=" 833 parameters[:op] = "NOT IN" 834 else 835 parameters[:op] = normalised_k 836 end 837 parameters[:value] = subv.name 838 else 839 parameters[:op] = normalised_k 840 parameters[:value] = subv 841 end 842 end 843 end 844 add_constraint(parameters) 845 elsif v.is_a?(Range) or v.is_a?(Array) 846 add_constraint(k.to_s, 'ONE OF', v.to_a) 847 elsif v.is_a?(InterMine::Metadata::ClassDescriptor) 848 add_constraint(:path => k.to_s, :sub_class => v.name) 849 elsif v.is_a?(InterMine::Lists::List) 850 add_constraint(k.to_s, 'IN', v.name) 851 elsif v.nil? 852 add_constraint(k.to_s, "IS NULL") 853 else 854 if path(k.to_s).is_attribute? 855 add_constraint(k.to_s, '=', v) 856 else 857 add_constraint(k.to_s, 'LOOKUP', v) 858 end 859 end 860 end 861 end 862 return self 863 end