Don’t Forget to Pass Results When Chaining SQL Queries!
My Experience Speeding Up ORM Methods for A Match Search
I learned an important lesson recently while writing code that finds Lego sets you can build based on Lego sets you already own:
Whenever possible, pass search results!
The Problem
Suppose you have a single match-search method you want to call, but you want some of the behaviors it performs to be reusable on their own. A good bet is to break up those behaviors into separate methods that can be used in your final match-search method. That’s what I did. That’s good reusable code.
But, if you don’t watch out, you might end up repeating some of the same queries to get those functions to be able to work on their own and be used in a longer string of methods you want to work together in sequence. …That’s also what I did. That repetition is bad coding.
Escaping Notice
I used Ruby’s ActiveRecord ORM, which is pretty fast. You can get away with some inefficiencies if you aren’t doing a lot of queries, and you might not even notice.
I never really noticed before this project.
My first build of the match-search feature took about 9 1/2 minutes to spit out its results when I had a total of around 340 Lego sets in my database.
That seemed far too long to be very useful to me. After going back through my code, I noticed it wasn’t as DRY as it needed to be.
The Fix
Without getting too far into the details, I discovered that I was calling the same query methods more than once, which is a problem when you are dealing with hundreds of sets, thousands of parts, and you’re iterating through them all!
In a nutshell, the solution I found was to have a default parameter. This allows you to run the method two ways. (1) If you know you want to run the method in a certain specified sequence and pass results into it, you open yourself up to the option of passing the results as an argument so that you Don’t Repeat Yourself in running a query. (2) If you think you might want to run the method in a different context, you can call the method without passing an argument, which will make use of your default parameter.
The Example
(Note the bold text.)
class LegoSet < ApplicationRecorddef self.potential_builds_regardless_of_color owned_parts = self.parts_and_quantity_owned
pot_sets_filtered = self.sets_filtered_by_part_types(owned_parts) results = [] pot_sets_filtered.each do |set|
if set.array_parts_quantity.all? do |part_record|
part_record[:part_quant] <= owned_parts.detect{|owned_part|
owned_part[:part_num] == part_record[:part_num]}[:part_quant]
end
set.update(potential_build: true)
results << set
end
end results
end
def self.sets_filtered_by_part_types(owned_parts = [])
if owned_parts.empty?
owned_parts = self.parts_and_quantity_owned
end
owned_parts_filter_array = owned_parts.map {|part_record|
part_record[:part_num]}
sets_parts_match = self.select do |set|
!set.owned && !set.match_parts_of_set.empty? &&
(set.match_parts_of_set.map {|part| part.part_num} -
owned_parts_filter_array).empty?
end
end...
end
There’s a lot going on here, and I’ve not given the code to the other methods being called within these methods.
On a high level—looking at the forest and not too closely at the trees—note that in the top method, called potential_builds_regardless_of_color
, I need to use owned_parts
inside the enumerable .all?
block. I also use owned_parts
to get the array of pot_sets_filtered
, using the helper method sets_filtered_by_part_types
(the second method above). So…I need to find the owned_parts
, and also use it in the helper method called inside this method.
When I first wrote this match-search feature, I did not think to find owned_parts
and then pass it to sets_filtered_by_part_types
. I just ran the same query and variable assignment owned_parts = self.parts_and_quantity_owned
both inside the top method, and inside the helper method.
But the parts_and_quantity_owned
method here calls a lot of other query methods (not listed), which run through all of the sets I own, aggregating all of the parts that belong to those sets by their part type and using a join-table to find the quantity of each part in each set.
Running that code twice took a lot of time…needlessly.
The Recap
Since I wanted to be able to run the sets_filtered_by_part_types
on its own for potential future functionality, I wanted to be able to run the method without passing it an owned_parts
value.
The right compromise, once again, was to set the method with a default parameter with an empty default value (owned_parts = []
)so that (1) I could pass down the results of owned_parts = self.parts_and_quantity_owned
and not force my program to run that method and its sub-methods all over again when needed, and (2) it could be run on its own without passing in results.
The Upshot
Once I realized I could speed things up by passing query method results to other methods, the time my search feature took to run went from 9 1/2 minutes with 340 Lego sets in my database to 21 seconds with 740 Lego sets!
So, if you find yourself querying the database a lot, make sure you’re not running the same query more than once, like I was. If you are, try passing the results between your query methods.