A backhoe working in a neighborhood next to a statue, safety cones, and a person parking their bike
A backhoe working in a neighborhood next to a statue, safety cones, and a person parking their bike

At Gusto, most of our site is powered by a large, mature, and active Rails monolith. Rails 7.0 was released this past December and my team, Product Infrastructure, wanted to upgrade from Rails 6.1 sooner than later. Upgrading Rails often has large change-sets; not only are there a lot of changes to the application, but the updated Rails dependency implies a ton of other code changes!

In this blog post, I'll be walking through the process my team took to safely upgrade to Rails 7.0. While this is going to be relatively Rails and Ruby specific, I hope you can take away something for software you are upgrading.

We wanted our process to make the upgrade a non-event for our customers and engineers. The primary strategy was to break up the work into as many small changes as possible in order to reduce batch size. When you have large running branches, I like to think of the unshipped code as a liability, and this Rails upgrade is no different. If you have to revert, it’s a lot of work to track down what caused the problem. Instead, we make a series of small changes, and I'll get into some of the strategies we used for that.

This strategy hinges on having a robust and reliable Continuous Integration (CI) system. We have invested heavily in this over the years. That could be its own post, but the highlight is we have enough unit and integration tests that catches a broad swath of problems from going out to production.

With this strategy, here are the steps we took:

  1. Create a spike branch to get CI running
  2. Use RSpec pending to acknowledge failures on the branch
  3. Use Rails's deprecation tracking system to surface problems in the development branch
  4. Backport Rails 7.0 functionality to Rails 6.1 in the development branch

Let's look at each one.

The Spike

First we started a new branch and tried upgrading as a spike. There's always a chance that just updating the Rails dependency will work; it's not a very good chance. The official "Upgrading Ruby on Rails" is a great resource to get started here.

The initial goal of the spike was to get the basics running:

  • bundle install
  • rails console
  • rails server
  • rspec
  • CI

At this point, we didn’t expect CI to pass, but we wanted to make sure that the test suites could be run, rather than just blowing up.

Working through test failures on the spike branch

Having failures on the spike branch means we had a list to work from!

Initially, it was a sea of red, and it was hard to choose any specific issue to work on. As an engineer, it's easy to look at a few different failures to see if there's anything easy that can be fixed. If it looks a lot more involved, it's tempting to put it down to look at something else. The problem there is that each person builds a little context, and the next comes in starting from scratch.

The way I solved this was by making a kanban-style GitHub Project board, and created issues for each class of failure, since many were similar. We used issue assignments to track who was working on what, and kept a running commentary about our understanding of the problem. When an engineer was done with one task, they could see what other work was In Progress and needed help, or could pull a new task from the To Do column.

Initially, I had trouble tracking which failures I had created issues for. I realized I could use RSpec's pending feature to help. You can add pending to a failing spec to indicate that it's still being worked on with an explanation on why it’s pending. With that, the spec still runs, but won’t fail the build.

For our Rails 7 upgrade, I used this convention:

pending "Rails 7 <GitHub Issue URL>"

As I added those, we saw the total failures in the CI go down, and it helped highlight the remaining failures to be reviewed and tracked. We could also be confident that the combination of pending and the project board meant we knew what we had left.

Another aspect of pending is that it will actually start failing when a spec is marked as pending but passes, meaning something was fixed!

Going from spike to mergeable branch

As the spike PR was shaping up, we started reviewing what changes were made. Some of the changes lined up with the "Upgrading from Rails 6.1 to Rails 7.0" section of the upgrading guide, but there was a lot more.

Rails follows semantic versioning, and since 7.0 is a major release, it’s allowed to break API compatibility. Because of that, major version upgrades are a good opportunity to remove previously deprecated features, and Rails 7.0 took advantage of that.

Working through Deprecations

As the spike progressed, there were several cases where methods, classes, and were missing, and most turned out to be deprecations that were now removed. Rails has its own framework for tracking use of drepcations, ActiveSupport::Deprecation, which you’ve probably seen from its default behavior:

DEPRECATION WARNING: ActiveRecord::Base.dump_schema_after_migration is deprecated and will be removed in Rails 7.1.

This framework can also be configured to raise an error instead, which is a good way to discover deprecations:

ActiveSupport::Deprecation.behavior = :raise

The downside to this is you have to fix all the deprecations, or you will get errors instead of logs, which is certainly undesirable in production. We want to be able to prevent new deprecation warnings from creeping in during development and CI, but we don't want to break production if there's usage in the code base that hasn't been tested.

ActiveSupport::Deprecation.behavior can also be configured with a lambda/proc/block . We have an internal gem that hooks into it, but it’s functionally pretty similar to the following:

# list out any acknowledged deprecations to not fail on
ALLOWED_DEPRECATIONS = [
    "ActiveRecord::Base.dump_schema_after_migration is deprecated and will be removed in Rails 7.1."
    # ...
]
ActiveSupport::Deprecation.behavior = -> (message, callstack, _deprecation_horizon, _gem_name) do
    return unless ALLOWED_DEPRECATIONS.any? {|allowed_message| message.include?(allowed_message)}

    e = DeprecationException.new(message)
    e.set_backtrace(callstack.map(&:to_s))
    if Rails.env.production?
        # report to error tracking instead of failing outright
    else
        # anywhere else, raise an error
        raise e
    end
  end

Whenever it raised an error, we added an issue to the board, and added the deprecation to the allowed list. Once we were tracking them all, we had a pretty thorough list to work from. The best part is they can be easily fixed in the development branch, and continually reintegrated into our Rails 7 branch.

We also monitored our bug tracking system for deprecations that weren’t caught by CI. When we found them, we added it as an issue to our project board, and gave an opportunity to pare down some technical debt on untested code.

Backporting

To make the diff as small as possible for the Rails 7 Pull Request, we needed to “backport” some changes so we can fix deprecations on both the development branch running Rails 6.1 and the Rails7.0 branch. Backporting is a practice you’ll often see for security and LTS (long term support), where you apply code changes from the newest version of code, to a previously released version. In our case, we’re using it as a bridge to get from one version of Rails to the next.

To backport changes, you need to understand what changed. I found it really helpful to look at the Rails source code on GitHub, and use blame to find the PR where the change was made in.

Here's an example where we decided to backport. We have some model tests that checks polymorphic relationships by setting the _type column to something invalid, but in Rails 7.0, this started raising an exception:

class Foo
  has_one :bar, polymorphic: true
end

foo = Foo.new
foo.bar_type = "non existent class" # => NameError (uninitialized constant non existent class)

We tracked this down to a change introduced in this PR that fixes this bug.

Granted, our model test isn’t ideal but it is a pattern in the code base. I didn't want to just change the test because our monolith's development is active enough that there's always a chance new code using this pattern is introduced. By backporting the change, we can fix the test and be sure that no other code is affected.

Having found the Rails PR that changed the behavior, I was able to create a backport by using a combination of module prepending and monkey patching a method:

module Rails
  module Backports
    module ReaderEnsureKlassExists
      def reader
        ensure_klass_exists!

        super
      end
    end
  end
end
module ActiveRecord
  module Associations
    class Association
      private

      # Reader and writer methods call this so that consistent errors are presented
      # when the association target class does not exist.
      def ensure_klass_exists!
        klass
      end 
    end

    class SingularAssociation
      prepend Rails::Backports::ReaderEnsureKlassExists
    end

    class CollectionAssociation
      prepend Rails::Backports::ReaderEnsureKlassExists
    end
  end

  module Reflection
    class AssociationReflection < MacroReflection #:nodoc:
      def compute_class(name)
        if polymorphic?
          raise ArgumentError, "Polymorphic associations do not support computing the class."
        end

        msg = <<-MSG.squish
          Rails couldn't find a valid model for #{name} association.
          Please provide the :class_name option on the association declaration.
          If :class_name is already provided make sure is an ActiveRecord::Base subclass.
        MSG

        begin
          klass = active_record.send(:compute_type, name)

          unless klass < ActiveRecord::Base
            raise ArgumentError, msg
          end

          klass
        rescue NameError
          raise NameError, msg
        end
      end
    end
  end
end

We've done upgrades like this before, and already had a pattern for patches like this. We have a local gem (as in, that is checked into the same repository) called rails-backports. Its entry point is a file that looks like this:

# lib/rails-backports.rb
if Gem::Requirement.new('~> 6.1.0').satisfied_by?(Rails.gem_version)
  require 'rails/backports/7.0/activerecord_associations_ensure_klass_exists'
end

By creating the backport, we could create a new branch off of development still running Rails 6.1, see the failures like we had on the Rails 7.0 branch, and then fix them.

The End Result

All said and done, we were able to ship this in 3 weeks with 6 engineers on it with minimal disruption to other engineers or customers.

The project board served us really well, and gave nice insight into how much work an upgrade like this is:

After working through our project board and iterating on the spike branch, it was looking good enough to ship! It ended up having only Gemfile and Gemfile.lock changes, which felt really good.

I don’t want to say this upgrade happened flawlessly, but it was pretty close!