Memoization in Ruby (made easy)

Whenever I write any sufficiently large Ruby app, I end up writing an expensive computation. I end up calling that computation over and over again even if the result is the same. I realize that’s a silly waste of resources and decide to save the result.

From that experience, I’ve learned a few standard patterns for memoization and used them too many times to ever want to rewrite them again. If you sail along with me on this journey, I’ll take you down memoization river with its many turns and bring you to the DRY land of a general solution.

Solving a Totally Contrived Example

The Problem

The journey starts out simple: I write an expensive query.

def users_with_discounts
  User.includes(payment_plan: :discounts).where(paying: true).to_a
end

But alas, I end up calling this users_with_discounts method quite a bit all over the place. It takes a while each time, so I wanna keep the results around.

1. A One-Line Solution

My next step is to memoize the result by saving it to an instance variable. The ||= operator makes this easy for me since if the instance variable is already defined, I’ll get back my value, but, otherwise, I’ll go ahead and do the query.

def users_with_discounts
  @users_with_discounts ||= User.includes(payment_plan: :discounts).where(paying: true).to_a
end

2. A Multi-Line Solution

We take another turn: now I need to do some post-processing for the query. That means it’s not as nice to use the ||= operator for memoization anymore. Ugh. No worries, though: I’ll check if the the instance variable is present manually in the first line, and if not, do the processing and save it away in the last.

def users_with_discounts
  return @users_with_discounts unless @users_with_discounts.nil?
  
  users = User.includes(payment_plan: :discounts).where(paying: true).to_a
  @users_with_discounts = users.select do |users|
    users.payment_plan.discounts.any? && !users.payment_plan.delayed?
  end
end

A Gotcha — Nil

I might as well mention now that in the case where our value returns nil for the previous solutions, I’ll end up doing the re-computation every time. Imagine if we had:

def users_with_discounts
  @users_with_discounts ||= computation_that_returns_nil
end

computation_that_returns_nil gives back nil. That means that the ||= operator will set @users_with_discounts to nil. Which means that as soon as users_with_discounts is called again the operator will find a nil value for @users_with_discounts and try to set it again. And so the merry-go-round will go.

3. Dealing with Nil — with a Sentinel Value

One of the ways I can solve this is to use a sentinel value in order to signal that we don’t need to do this computation over and over again. If I find that our computation can return nil, I can intercept the result and set it to false, or even [], or (if you’re on on the functional train) I could use the Nothing part of the Option monad. Please do note that I still don’t use my good friend ||=. It swallows false in the same way it swallows nil.

def users_with_discounts
  return @users_with_discounts unless @users_with_discounts.nil?
  @users_with_discounts = computation_that_returns_nil || false
end

However, using this solution also has a side-effect. While my unmemoized method used to return a Maybe[ArrayOf[User]], now it’ll be returning the ever more complicated Or[Boolean, ArrayOf[User]]. If I don’t want to randomly change expectations of what this method returns, I’ll have to be more crafty.

4. Dealing with Nil — with a Cache

The other way I can deal with the presence of nil is by putting it behind a cache. For the purposes of that cache, I’ll be using Ruby’s handy Hash.

def users_with_discounts
  @users_with_discounts ||= {}
  if @users_with_discounts.has_key?(:return_value)
    @users_with_discounts[:return_value]
  else
    @users_with_discounts[:return_value] = computation_that_returns_nil  
  end
end

The first time we call users_with_discounts, it will have its return_value key set. That means that on further calls, regardless if that key was set to nil or not, I’ll still have the result of the computation directly returned to me. I also end up with the same exact type for our result as when we started.

A Gotcha-Methods with Parameters

Let’s say that now I’d like to do the query with some changeable arguments. Does that mean I should stop memoizing? That depends.

Do I expect the result to be the same for a given set of arguments?
Am I okay with storing the results for each set of arguments?

If I say yes to both of the above, hope is not yet lost. I can still memoize my results. I just can’t do it the same way as I did above. Since the output of any function like 1+x is different depending on x, I’d have to take arguments into account.

5. Dealing with Methods with Parameters

Getting back to our query, I answer yes to the two questions above and memoization is in the stars. I can re-use the caching solution with a key difference: I will use the arguments as a key.

def users_with_discounts(scoped_to={})
  @users_with_discounts ||= {}
  return @users_with_discounts[scoped_to] if @users_with_discounts.has_key?(scoped_to)

  users = User.includes(payment_plan: :discounts).where(
    paying: true, 
    **scoped_to
  ).to_a
  
  @users_with_discounts[scoped_to] = users.select do |users|
    users.payment_plan.discounts.any? && !users.payment_plan.delayed?
  end
end

Now, for every scoped_to that is passed in, we’ll store its results away for reuse in our hash — making every future call next to instantaneous.

Why Did You Make Me Read All This?

A fair question. Honestly, it’s so that you can both (a) commiserate in the problem that I’ve had to solve like a half bajillion times and (b) wonder why I’m still solving this problem after I’ve solved it like a half bajillion times.

That’s precisely what I’ve been wondering. I’ve been waiting for the day I can just write a memoized on top of my method and forget I ever had to suffer through one of these implementations forever.

memoized
def users_with_discounts(scoped_to={})
  users = User.includes(payment_plan: :discounts).where(paying: true, **scoped_to).to_a
  users.select do |users|
    users.payment_plan.discounts.any? && !users.payment_plan.delayed?
  end
end

Needless to say, that day has come. It is the general, DRY solution I’ve promised you all the way at the start of this post.

A Generalized Memoizer

module RubyMemoized
  class Memoizer
    attr_reader :context, :method

    def initialize(context, method)
      @context = context
      @method = method
    end

    def call(*args, &block)
      return cache[[args, block]] if cache.has_key?([args, block])
      cache[[args, block]] = context.send(method, *args, &block)
    end

    def cache
      @cache ||= {}
    end
  end

  def self.included(klass)
    klass.extend(ClassMethods)
  end

  module ClassMethods
    def memoized
      @memoized = true
    end

    def unmemoized
      @memoized = false
    end

    def method_added(method_name)
      if @memoized
        @memoized = false

        unmemoized_method_name = :"unmemoized_#{method_name}"
        
        memoizer_name = :"memoizer_for_#{method_name}"
        define_method memoizer_name do
          memoizer = instance_variable_get "@#{memoizer_name}"
          if memoizer
            memoizer
          else
            instance_variable_set "@#{memoizer_name}", Memoizer.new(self, unmemoized_method_name)
          end
        end

        alias_method unmemoized_method_name, method_name

        define_method method_name do |*args, &block|
          send(memoizer_name).call(*args, &block)
        end

        @memoized = true
      end
    end
  end
end

Its Power is Immense

I wanted this to be a nice and lazy way of solving my memoization needs. So to prove that it’s exactly that, we’ll (crudely) solve the problem of finding the nth Fibonacci number.

class FibonacciCalculator
  include RubyMemoized

  def calculate(n)
    return n if (0..1).include?(n)
    calculate(n - 1) + calculate(n - 2)
  end

  memoized

  def memoized_calculate(n)
    return n if (0..1).include?(n)
    memoized_calculate(n - 1) + memoized_calculate(n - 2)
  end
end

The calculate method is pretty lame. It never stores solutions to its sub-problems and always recalculates them. The memoized_calculate method is its cousin that does.

On the surface, it doesn’t seem like there’ll be much of a difference between the two implementations. However, with the power of benchmarking, I will prove to you it is not so.

require 'benchmark'

Benchmark.bm do |measurer|
  calculator = FibonacciCalculator.new

  measurer.report 'with memoization' do
    10.times { calculator.memoized_calculate(35) }
  end

  measurer.report 'without memoization' do
    10.times { calculator.calculate(35) }
  end
end

When we run the above code, here’s the breakdown we receive:

                    user       system    total     real
with memoization    0.000000   0.000000  0.000000  ( 0.000469)
without memoization 46.010000  0.160000  46.170000 (46.597599)

Memoization wins 0.005 seconds to 46.598 seconds. I’d say I got my memoization needs covered. And so can you!

And Here’s How It Works

The Memoizer class contains basically everything from our discussion above: looking at line 11, we can see the hash cache pattern from solution (4) above. Not only that, but we also see a more aggressive version of solution (5) for any arbitrary set of arguments and block. All of that is powered by our original idea to store data in an instance variable from solution (1) in the cache on line 16.
The rest of the code is quite a bit of metaprogramming.

Here’s a rundown of what it does:
Whenever a method is added after the memoized class method is called, it will create a few new methods — the old unmemoized method will be renamed (line 49) and the new memoized method (line 51) will call a memoizer that was created specifically for it (line 40). That means that this memoization will work on a per instance and per method level.

The memoized and unmemoized class methods end up working in much the same way that the public and private methods do — everything underneath them will either be memoized or not up until the next call to either one of them.

But There are Tradeoffs

Remember the two questions I posed when we were talking about memoizing methods with parameters? Let’s re-examine them again in the context of our new generic memoizer:

Do I expect the result to be the same for a given set of arguments?

One way of reframing this questions is to ask, “Are the methods pure?” In other words, given a set of inputs, we will always get the same outputs. This won’t be true if we’re manipulating the state of our object to make the computation, since the state will need to change in between method calls. That means the output of our method will be different every time. For example: the method def sum(x); 1 + x; end is a pure but def sum(x); 1 + x + external_number; end is not.

So why does this matter? If we were working with the second method and using our memoizer, we’d get the wrong answer any time external_number has changed. That would certainly be unexpected.

Here’s a dirty secret: our beloved users_with_discounts method is not pure. The database state is external and can change at any time, so the same inputs will not give me back the same outputs. Then why did I say yes to my question?

In this particular case, I wanted to have a particular set of results to work through. If the state of the database had changed during the time I was manipulating my results, I didn’t want to see them — so I was effectively pretending that my method was pure. If new users with discounts came up during this time, I’d rather process them in a separate job that was running over just these new ones than having the first method call return one set of results and the other method call return another set — causing some changes to be applied to one set of objects but not to another.

Am I okay with storing the results for each set of arguments?

It uses a bit of memory to reduce the amount of processing work. So, if you’re storing millions of ActiveRecord records in that computation, remember that your process size will balloon for the duration of your computation. It’s also important to remember that if you end up calling the method with thousands of arguments that you’ll be storing thousands of results sets. In either case, remember to make this memory available for garbage collection with a neat trick: assigning the variable for the object that did the computation to nil (in some scenarios, running GC.start does not necessarily hurt either).

Conclusion

I suffered through many implementations of memoization. You saw me suffer through an example. I hope you don’t have to suffer. That’s why I put together this gem: https://github.com/kklimuk/ruby_memoized.

Just include it into your Gemfile with gem 'ruby_memoized' and get memoizing today!

Acknowledgments

Thank you to Justin Worth, Matt Wilde, and Yi Lang Mok for reviewing this article for both form and substance.

This article was originally published here.

Engineering ( 50 )

Ruby On Rails ( 16 )

Advice ( 14 )

Career growth ( 14 )

Collaboration ( 13 )

Software Development ( 12 )

Diversity Inclusion ( 11 )

Programming ( 10 )

Modular Monolith ( 9 )

Security ( 9 )

Best Practice ( 9 )

Modularization ( 9 )

Gusto ( 8 )

Spaghetti Code ( 7 )

Ruby ( 6 )

Interviews ( 5 )

Sidekiq ( 5 )

Gradual Modularization ( 5 )

Teamwork ( 4 )

Product Management ( 4 )

Engineering Management ( 4 )

Tech Lead ( 4 )

Refactoring ( 4 )

Monolith ( 4 )

Coding ( 4 )

growth ( 4 )

Technical Strategy ( 4 )

Spaghetti Model ( 4 )

Teams ( 3 )

Startups ( 3 )

Database ( 3 )

Authorization ( 3 )

Guide ( 3 )

Data Engineering ( 3 )

Tidying ( 3 )

Api ( 3 )

Experiment ( 3 )

Decision ( 3 )

Gusto Values ( 3 )

Productivity ( 3 )

Performance ( 3 )

Payroll ( 3 )

Architecture ( 3 )

React ( 2 )

Javascript ( 2 )

PM ( 2 )

Job Hunt ( 2 )

Migration ( 2 )

Redis ( 2 )

Incident Response ( 2 )

Women ( 2 )

Rails Engines ( 2 )

Startup Lessons ( 2 )

Big Data ( 2 )

Getting Started ( 2 )

Data Lake ( 2 )

Pull Request ( 2 )

Code Review ( 2 )

Integrations ( 2 )

Outcomes ( 2 )

MVP ( 2 )

Client Platform Engineering ( 2 )

CI/CD ( 2 )

PII ( 2 )

Testing ( 2 )

Packwerk ( 2 )

Gem ( 2 )

Platform ( 2 )

Mentorship ( 2 )

Internship ( 2 )

Backbone ( 1 )

Webpack ( 1 )

Immutable ( 1 )

Cto ( 1 )

Double Write ( 1 )

Single Read ( 1 )

Models ( 1 )

Double Read ( 1 )

Single Write ( 1 )

Zero Downtime ( 1 )