Whenever I write any sufficiently large Ruby app, I end up writing an expensive computation. I end up calling that computation over and over again even if the result is the same. I realize that’s a silly waste of resources and decide to save the result.
From that experience, I’ve learned a few standard patterns for memoization and used them too many times to ever want to rewrite them again. If you sail along with me on this journey, I’ll take you down memoization river with its many turns and bring you to the DRY land of a general solution.
Solving a Totally Contrived Example
The Problem
The journey starts out simple: I write an expensive query.
def users_with_discounts
User.includes(payment_plan: :discounts).where(paying: true).to_a
end
But alas, I end up calling this users_with_discounts
method quite a bit all over the place. It takes a while each time, so I wanna keep the results around.
1. A One-Line Solution
My next step is to memoize the result by saving it to an instance variable. The ||=
operator makes this easy for me since if the instance variable is already defined, I’ll get back my value, but, otherwise, I’ll go ahead and do the query.
def users_with_discounts
@users_with_discounts ||= User.includes(payment_plan: :discounts).where(paying: true).to_a
end
2. A Multi-Line Solution
We take another turn: now I need to do some post-processing for the query. That means it’s not as nice to use the ||=
operator for memoization anymore. Ugh. No worries, though: I’ll check if the the instance variable is present manually in the first line, and if not, do the processing and save it away in the last.
def users_with_discounts
return @users_with_discounts unless @users_with_discounts.nil?
users = User.includes(payment_plan: :discounts).where(paying: true).to_a
@users_with_discounts = users.select do |users|
users.payment_plan.discounts.any? && !users.payment_plan.delayed?
end
end
A Gotcha — Nil
I might as well mention now that in the case where our value returns nil
for the previous solutions, I’ll end up doing the re-computation every time. Imagine if we had:
def users_with_discounts
@users_with_discounts ||= computation_that_returns_nil
end
computation_that_returns_nil
gives back nil
. That means that the ||=
operator will set @users_with_discounts
to nil
. Which means that as soon as users_with_discounts
is called again the operator will find a nil value for @users_with_discounts
and try to set it again. And so the merry-go-round will go.
3. Dealing with Nil — with a Sentinel Value
One of the ways I can solve this is to use a sentinel value in order to signal that we don’t need to do this computation over and over again. If I find that our computation can return nil
, I can intercept the result and set it to false
, or even []
, or (if you’re on on the functional train) I could use the Nothing part of the Option
monad. Please do note that I still don’t use my good friend ||=
. It swallows false
in the same way it swallows nil
.
def users_with_discounts
return @users_with_discounts unless @users_with_discounts.nil?
@users_with_discounts = computation_that_returns_nil || false
end
However, using this solution also has a side-effect. While my unmemoized method used to return a Maybe[ArrayOf[User]]
, now it’ll be returning the ever more complicated Or[Boolean, ArrayOf[User]]
. If I don’t want to randomly change expectations of what this method returns, I’ll have to be more crafty.
4. Dealing with Nil — with a Cache
The other way I can deal with the presence of nil is by putting it behind a cache. For the purposes of that cache, I’ll be using Ruby’s handy Hash
.
def users_with_discounts
@users_with_discounts ||= {}
if @users_with_discounts.has_key?(:return_value)
@users_with_discounts[:return_value]
else
@users_with_discounts[:return_value] = computation_that_returns_nil
end
end
The first time we call users_with_discounts
, it will have its return_value
key set. That means that on further calls, regardless if that key was set to nil
or not, I’ll still have the result of the computation directly returned to me. I also end up with the same exact type for our result as when we started.
A Gotcha-Methods with Parameters
Let’s say that now I’d like to do the query with some changeable arguments. Does that mean I should stop memoizing? That depends.
- Do I expect the result to be the same for a given set of arguments?
- Am I okay with storing the results for each set of arguments?
If I say yes to both of the above, hope is not yet lost. I can still memoize my results. I just can’t do it the same way as I did above. Since the output of any function like 1+x
is different depending on x
, I’d have to take arguments into account.
5. Dealing with Methods with Parameters
Getting back to our query, I answer yes to the two questions above and memoization is in the stars. I can re-use the caching solution with a key difference: I will use the arguments as a key.
def users_with_discounts(scoped_to={})
@users_with_discounts ||= {}
return @users_with_discounts[scoped_to] if @users_with_discounts.has_key?(scoped_to)
users = User.includes(payment_plan: :discounts).where(
paying: true,
**scoped_to
).to_a
@users_with_discounts[scoped_to] = users.select do |users|
users.payment_plan.discounts.any? && !users.payment_plan.delayed?
end
end
Now, for every scoped_to
that is passed in, we’ll store its results away for reuse in our hash — making every future call next to instantaneous.
Why Did You Make Me Read All This?
A fair question. Honestly, it’s so that you can both (a) commiserate in the problem that I’ve had to solve like a half bajillion times and (b) wonder why I’m still solving this problem after I’ve solved it like a half bajillion times.
That’s precisely what I’ve been wondering. I’ve been waiting for the day I can just write a memoized
on top of my method and forget I ever had to suffer through one of these implementations forever.
memoized
def users_with_discounts(scoped_to={})
users = User.includes(payment_plan: :discounts).where(paying: true, **scoped_to).to_a
users.select do |users|
users.payment_plan.discounts.any? && !users.payment_plan.delayed?
end
end
Needless to say, that day has come. It is the general, DRY solution I’ve promised you all the way at the start of this post.
A Generalized Memoizer
module RubyMemoized
class Memoizer
attr_reader :context, :method
def initialize(context, method)
@context = context
@method = method
end
def call(*args, &block)
return cache[[args, block]] if cache.has_key?([args, block])
cache[[args, block]] = context.send(method, *args, &block)
end
def cache
@cache ||= {}
end
end
def self.included(klass)
klass.extend(ClassMethods)
end
module ClassMethods
def memoized
@memoized = true
end
def unmemoized
@memoized = false
end
def method_added(method_name)
if @memoized
@memoized = false
unmemoized_method_name = :"unmemoized_#{method_name}"
memoizer_name = :"memoizer_for_#{method_name}"
define_method memoizer_name do
memoizer = instance_variable_get "@#{memoizer_name}"
if memoizer
memoizer
else
instance_variable_set "@#{memoizer_name}", Memoizer.new(self, unmemoized_method_name)
end
end
alias_method unmemoized_method_name, method_name
define_method method_name do |*args, &block|
send(memoizer_name).call(*args, &block)
end
@memoized = true
end
end
end
end
Its Power is Immense
I wanted this to be a nice and lazy way of solving my memoization needs. So to prove that it’s exactly that, we’ll (crudely) solve the problem of finding the nth Fibonacci number.
class FibonacciCalculator
include RubyMemoized
def calculate(n)
return n if (0..1).include?(n)
calculate(n - 1) + calculate(n - 2)
end
memoized
def memoized_calculate(n)
return n if (0..1).include?(n)
memoized_calculate(n - 1) + memoized_calculate(n - 2)
end
end
The calculate
method is pretty lame. It never stores solutions to its sub-problems and always recalculates them. The memoized_calculate
method is its cousin that does.
On the surface, it doesn’t seem like there’ll be much of a difference between the two implementations. However, with the power of benchmarking, I will prove to you it is not so.
require 'benchmark'
Benchmark.bm do |measurer|
calculator = FibonacciCalculator.new
measurer.report 'with memoization' do
10.times { calculator.memoized_calculate(35) }
end
measurer.report 'without memoization' do
10.times { calculator.calculate(35) }
end
end
When we run the above code, here’s the breakdown we receive:
user system total real
with memoization 0.000000 0.000000 0.000000 ( 0.000469)
without memoization 46.010000 0.160000 46.170000 (46.597599)
Memoization wins 0.005 seconds to 46.598 seconds. I’d say I got my memoization needs covered. And so can you!
And Here’s How It Works
The Memoizer
class contains basically everything from our discussion above: looking at line 11, we can see the hash cache pattern from solution (4) above. Not only that, but we also see a more aggressive version of solution (5) for any arbitrary set of arguments and block. All of that is powered by our original idea to store data in an instance variable from solution (1) in the cache on line 16.
The rest of the code is quite a bit of metaprogramming.
Here’s a rundown of what it does:
Whenever a method is added after the memoized
class method is called, it will create a few new methods — the old unmemoized method will be renamed (line 49) and the new memoized method (line 51) will call a memoizer that was created specifically for it (line 40). That means that this memoization will work on a per instance and per method level.
The memoized
and unmemoized
class methods end up working in much the same way that the public
and private
methods do — everything underneath them will either be memoized or not up until the next call to either one of them.
But There are Tradeoffs
Remember the two questions I posed when we were talking about memoizing methods with parameters? Let’s re-examine them again in the context of our new generic memoizer:
Do I expect the result to be the same for a given set of arguments?
One way of reframing this questions is to ask, “Are the methods pure?” In other words, given a set of inputs, we will always get the same outputs. This won’t be true if we’re manipulating the state of our object to make the computation, since the state will need to change in between method calls. That means the output of our method will be different every time. For example: the method def sum(x); 1 + x; end
is a pure but def sum(x); 1 + x + external_number; end
is not.
So why does this matter? If we were working with the second method and using our memoizer, we’d get the wrong answer any time external_number
has changed. That would certainly be unexpected.
Here’s a dirty secret: our beloved users_with_discounts
method is not pure. The database state is external and can change at any time, so the same inputs will not give me back the same outputs. Then why did I say yes to my question?
In this particular case, I wanted to have a particular set of results to work through. If the state of the database had changed during the time I was manipulating my results, I didn’t want to see them — so I was effectively pretending that my method was pure. If new users with discounts came up during this time, I’d rather process them in a separate job that was running over just these new ones than having the first method call return one set of results and the other method call return another set — causing some changes to be applied to one set of objects but not to another.
Am I okay with storing the results for each set of arguments?
It uses a bit of memory to reduce the amount of processing work. So, if you’re storing millions of ActiveRecord records in that computation, remember that your process size will balloon for the duration of your computation. It’s also important to remember that if you end up calling the method with thousands of arguments that you’ll be storing thousands of results sets. In either case, remember to make this memory available for garbage collection with a neat trick: assigning the variable for the object that did the computation to nil
(in some scenarios, running GC.start
does not necessarily hurt either).
Conclusion
I suffered through many implementations of memoization. You saw me suffer through an example. I hope you don’t have to suffer. That’s why I put together this gem: https://github.com/kklimuk/ruby_memoized.
Just include it into your Gemfile
with gem 'ruby_memoized'
and get memoizing today!
Acknowledgments
Thank you to Justin Worth, Matt Wilde, and Yi Lang Mok for reviewing this article for both form and substance.
This article was originally published here.