Embracing Functional Programming in Ruby

This post was originally posted on kellysutton.com. This message has been modified to fit your screen.

At Gusto, we’ve been knee-deep in a substantial refactor of our system for running payrolls.

Running a payroll requires taking several different inputs such as how much an employee should get paid, where did they work, how much did they work, how much taxes should they pay, how much taxes have they paid this year, and so on and so on.

As a company that offers a payroll service, keeping this piece of the system in tip-top shape is important for the business. Customers love Gusto for its simplicity and speed when it comes to running payroll.

Over the years, this system grew beyond its original mandate. Rather than just serve payroll for one state, it now serves them for all 50 states and the District of Columbia. Although customers love our payroll, internally new engineers had a difficult time understanding the code and making changes safely. This system needed a tune-up, so we embarked on a sizable refactor.

Because the process of calculating what you need for a payroll is one big formula, we set the goal of making this system “more functional” as in functional programming. We wanted to take the process of calculating a payroll and make it one big stateless operation.

The server-side code at Gusto is written in Ruby, a language usually known for its object-oriented and metaprogramming roots. Nonetheless, we wanted to integrate some more functional concepts into our code in the hopes of increasing the system’s safety and clarity. The result has been maintainable code that is easier to reason about and safer to change.

Embracing the PFaaO

Ruby is an expressive language, but it does not lend itself to some common functional practices. Although Ruby allows for closures and first-class functions via Procs, one does not see many Procs passed around as objects in idiomatic Ruby.

Throughout our work, we discovered that you can create expressive interfaces with clean internals by embracing both OO and functional aspects of Ruby.

To do this, we used a pattern called Pure Function as an Object (PFaaO). Essentially, you design objects as you would a pure function but dress them up as Ruby classes.

A pure function is a function without observable side effects that always returns the same value for a given set of inputs. That means no talking to the database, no modifying the state of other objects, no accessing the system clock, etc. When we write a PFaaO in Ruby, we want to build an object that has no side effects.

A simple PFaaO might look like the following:

class PayrollCalculator
  def self.calculate(payroll)
    new(payroll).calculate
  end

  def initialize(payroll)
    @payroll = payroll
  end
  private_class_method :new

  def calculate
    PayrollResult.new(
      payroll: payroll,
      paystubs: paystubs,
      taxes: taxes,
      debits: debits
    )
  end

  def paystubs
    # ...
  end

  def taxes
    # ...
  end

  def debits
    # ...
  end
end

There’s quite a bit going on here, so let's break it down bit by bit.

First, our class has only one effective public interface: PayrollCalculator.calculate. Because we've declared the constructor private using private_class_method :new, the instance method #calculate is effectively private.^[1]

This means that all of the other instance methods we declare are implicitly private, even though there is no explicit private block within this class. Because there’s no way to .new up an instance, there is not a vector to call any instance methods.

Our method only has one public interface and its designed operation is effectively stateless, therefore we only need to exercise one interface in our tests. Put some data in, assert that the data coming out is what we expected.

Referential Transparency for Free

In our above example, let’s say that the process of calculating taxes is expensive from a time perspective.^[2] Thus, we want to make a time/space tradeoff to consume more memory to minimize the number of times we need to compute taxes. In our example, calculating both #paystubs and #debits will require the result of #taxes.

Now because each of these private methods is a pure function, we have referential transparency. This means we can replace a method and its parameters with its return value. Think of it like algebra: Given the function f(x) = x + 5, you can safely replace any occurrence of f(2) with the value 7.

What does this mean for a Rubyist? Free and safe memoization:

def paystubs
  calculate_paystubs(taxes, ...)
end

def debits
  calculate_debits(taxes, ...)
end

def taxes
  @taxes ||= calculate_taxes(@payroll)
end

Memoization is a form of caching, and can be fraught with issues if the memoized value does not actually come from a pure function. But because we make everything within the PFaaO pure, we can safely memoize this method call.

This is interesting because it looks like this class is no longer stateless: it now assigns local values. However, since the only interface is the single .calculate class method, each instance of our PFaaO is single-use. Any intermediate state can never be accessed by externally. Because this cached state is not observable externally, our function is still technically pure.

Much in the way a developer can abstract synchronous and asynchronous behavior, you can do the same with functional purity. Any local state changes are irrelevant in the lifecycle of the PFaaO. These local state changes are not observable from the outside world.

Expanding PFaaOs

As I’ve grown in my career, I have become less interested in how software is written but how it is maintained. Software maintenance is the blessing and the curse of any successful project: Congratulations! You have a business with lasting value. Our condolences! You must now pay for all of your mistakes. Nonetheless, it is always preferred to have a business that exists with technical debt, than to have a bankrupt company with a pristine code base.

PFaaOs in Ruby are great because they are easy to maintain. Not only are they easy to test, but they are predisposed to healthy growth.

What do I mean by that?

Let’s again take the example of our #taxes method. Early in Gusto’s history (back when it was still known as ZenPayroll), we only offered payroll services in California. Thus, we only needed to worry about payroll taxes for California.

In the grand scheme of things, California is a simple state when it comes to payroll taxes. Our taxes method might have looked like nothing more than the following:

def taxes
  federal_taxes(@payroll) +
    california_taxes(@payroll) +
    local_taxes(@payroll)
end

Now let’s say we expanded into a new state, New York. Now our method grows a little bit:

def taxes
  federal_taxes(@payroll) +
    california_taxes(@payroll) +
    new_york_taxes(@payroll) +
    local_california_taxes(@payroll) +
    local_new_york_taxes(@payroll)
end

As we expand into every state,^[3] this method will grow to be quite large! Furthermore, each of these methods adds to the length of our PayrollCalculator class. Without constant gardening, the class could become difficult to understand.

But because each of our methods within a PFaaO is itself a pure function, we are able to extract classes as we see fit and make each one a new PFaaO. We can safely replace our growing methods with new PFaaOs:

def taxes
  PayrollCalculator::Taxes.calculate(@payroll)
end

As we tease apart these different PFaaOs, we also get a much better idea of the input requirements for these service classes. Our @payroll is a large parameter object, and each extracted PFaaO probably only needs a subset of its data.

So we can get away with something like:

def taxes
  PayrollCalculator::Taxes.calculate(
    @payroll.only_pay_and_location_data
  )
end

Here we assume that the Payroll#only_pay_and_location_data returns a slice of the total data within the instance as a new Value Object. This Value Object represents only the data required to calculate the taxes part of running a payroll.

Data is Immutable by Default

Another important ingredient for scalable PFaaOs is the requirement that all data be immutable by default. This is a drastic change from how most folks traditionally write Ruby.

Every time you reach for your =, you'll need to replace it with a #set or #put. Rather than modifying objects in place, you will get used to returning new copies with new values. (Hamster, which provides great immutable data structures, can help you from having to hand-roll FP functionality.)

What does this mean for Rails? It will often mean creating functions or classes that take ActiveRecord objects and convert them into immutable value objects. For us, we carve out these value objects into the namespace of what we're doing. For example, here are the two representations of a payroll in our system:

# app/models/payroll.rb
class Payroll < ActiveRecord::Base
end

# app/services/payroll_calculator/payroll.rb
class PayrollCalculator::Payroll < ValueObject
end

The ActiveRecord version of a payroll represents the data that lives in the database. It is a superset of the data required for actually running a payroll. Although they have the same name, they do not have the same attributes. For example, the ActiveRecord version of Payroll will have a processed_at attribute, whereas the Payroll that lives in the calculation domain does not.

In the words of Domain-Driven Design, each namespace here is a different Bounded Context. We implement adapters to take ActiveRecord payrolls and turn them into PayrollCalculator payrolls, and vice versa.

The upside of this is something you might see in any other large system with well-defined abstractions; changes in models don’t cross domains. In our example, we can change the structure of the Payroll in our database without needing to change the calculation code. We would only need to change our adapter. Furthermore, this context is entirely separate from the machinations of Rails. We could easily and safely pull this into its own gem or a separate service entirely.

Were our ActiveRecord objects be parameters to our calculator, adding or removing columns from the ActiveRecord objects could cause a series of cascading, painful, and dangerous changes.

For young Rails apps, this level of indirection is overkill. As apps grow and multiple teams begin contributing to the same application, Bounded Contexts like these are necessary.

Conclusion

We’ve been slowly refactoring our payroll calculators toward this model and use it to safely process upwards of $1 billion per month.

The results have been remarkable: adding or changing payroll code is now a much safer operation. Because each change is much more isolated, a developer only needs to concern herself with the local implementation.

Although this post does not cover it, testing PFaaOs with immutable data is a breeze. We find ourselves performing less setup for each method and class. Our tests remain fast as they do not hit the database.

It’s not all sunshine and rainbows, though. This approach does result in a larger volume of code. My rough estimate would peg it at about a 1.5x - 2x increase in code volume. Some developers dislike the the sprawling nature of the many PFaaOs that result. Although the total lines of code will increase, this approach should help your team develop a better understanding of the data requirements of each Bounded Context. Put another way: you don't need to pass around whole ActiveRecord objects, but just small bundles of their attributes.

Before embracing this completely, discuss with your team to set up a few ground rules. We typically shoot for about 100 lines per class, but your team might decide on something different. Make sure to get on the same page and agree that your app is at the size where it might benefit from this style of thinking.

For some teams, the extra layers of abstraction between ActiveRecord and doing interesting things with the data might seem like overkill. In many situations, it will be. Again, I encourage you to have a healthy discussion with your team to decide if the benefits of this approach outweigh the negatives.

For us, we’re employing it everywhere appropriate. Give this pattern a shot and let me know how it goes!

Special thanks to Justin Duke, Eddie Kim, Bo Sørensen, Matt Lewis, and Julia Lee for providing feedback on early drafts of this post.

Keen writers will know that nothing is ever really private in Ruby. There is always #send. ↩︎
At Gusto, calculating taxes is expensive! Did you know that there are more than 6,000 payroll taxes within the United States? Each one may or may not need to be applied for a given payroll, based on the different parameters of the payroll itself. ↩︎
Today, Gusto provides payroll services in every state including D.C. with some of the lowest error rates in the industry. ↩︎

Engineering ( 50 )

Ruby On Rails ( 16 )

Advice ( 14 )

Career growth ( 14 )

Collaboration ( 13 )

Software Development ( 12 )

Diversity Inclusion ( 11 )

Programming ( 10 )

Modular Monolith ( 9 )

Security ( 9 )

Best Practice ( 9 )

Modularization ( 9 )

Gusto ( 8 )

Spaghetti Code ( 7 )

Ruby ( 6 )

Interviews ( 5 )

Sidekiq ( 5 )

Gradual Modularization ( 5 )

Teamwork ( 4 )

Product Management ( 4 )

Engineering Management ( 4 )

Tech Lead ( 4 )

Refactoring ( 4 )

Monolith ( 4 )

Coding ( 4 )

growth ( 4 )

Technical Strategy ( 4 )

Spaghetti Model ( 4 )

Teams ( 3 )

Startups ( 3 )

Database ( 3 )

Authorization ( 3 )

Guide ( 3 )

Data Engineering ( 3 )

Tidying ( 3 )

Api ( 3 )

Experiment ( 3 )

Decision ( 3 )

Gusto Values ( 3 )

Productivity ( 3 )

Performance ( 3 )

Payroll ( 3 )

Architecture ( 3 )

React ( 2 )

Javascript ( 2 )

PM ( 2 )

Job Hunt ( 2 )

Migration ( 2 )

Redis ( 2 )

Incident Response ( 2 )

Women ( 2 )

Rails Engines ( 2 )

Startup Lessons ( 2 )

Big Data ( 2 )

Getting Started ( 2 )

Data Lake ( 2 )

Pull Request ( 2 )

Code Review ( 2 )

Integrations ( 2 )

Outcomes ( 2 )

MVP ( 2 )

Client Platform Engineering ( 2 )

CI/CD ( 2 )

PII ( 2 )

Testing ( 2 )

Packwerk ( 2 )

Gem ( 2 )

Platform ( 2 )

Mentorship ( 2 )

Internship ( 2 )

Backbone ( 1 )

Webpack ( 1 )

Immutable ( 1 )

Cto ( 1 )

Double Write ( 1 )

Single Read ( 1 )

Models ( 1 )

Double Read ( 1 )

Single Write ( 1 )

Zero Downtime ( 1 )