This past summer, I published a blog post called “Laying the Cultural and Technical Foundation for Big Rails” and spoke about it at RailsConf.
Since then, we’ve made a ton of development and improvements, and in this blog post I want to share about the latest public iteration of the RubyAtScale Modularization Toolchain that we use at Gusto to modularize our large Ruby on Rails application and is currently under active development and improvement.
This post is mostly meant as a how-to guide to effectively use these solutions, but in a future post I’ll dive more into the problem of large, entangled codebases and how modularization principles can be used to address that problem.
Packs and their Extensions
The toolchain starts with
packs, a pretty simple specification for a ruby package, or “pack” as we call it. By itself,
packsdoesn’t do much. Instead, it’s extended by a modular suite of tools that can be adopted gradually.
packs-railscan be dropped into your Rails application to ensure Rails Autoload Paths are set up correctly. This also includes helpers to make rspec and factory_bot integrate with your packs.
rubocop-packscontain cops that are intended to help improve boundaries between your packs.
packwerk, from Shopify, can be used to define architectural rules about pack boundaries. More on
packwerk-extensions, which use packwerk’s extensible API to provide other types of pack boundaries.
danger-packwerk, which provides automated inline pull request comments related to architecture boundary violations.
code_ownershipcan be used to specify ownership of packs and integrate it into various developer tools.
use_packsexposes a CLI,
bin/packs, that makes it easy to create new packs, move files between packs, and more.
pack_statsmakes it easy to send metrics about pack adoption and modularization to your favorite metrics provider, such as DataDog (which has built-in support).
How Is a Pack Different From a Gem?
A ruby gem is the Ruby community solution for packaging and distributing Ruby code. A gem is a great place to start new projects, and a great end state for code that’s been extracted from an existing codebase.
packs are intended to help gradually modularize an application that has some conceptual boundaries, but is not yet ready to be factored into gems.
How Packwerk Works
Packwerk does the following three things in order:
- Parses a graph of “pack” nodes, creating edges based on references to Ruby Constants.
- Provides a simple and extensible, YML based declarative language for constraining that graph in various ways (e.g. dependencies and public API usage), called
- Outputs the “diff” between the declared graph and the constrained graph as another YML file called
Packwerk analyzes each reference to a ruby constant with something called a “checker.” A checker takes in a reference to a constant, which includes information about what pack is referencing the constant as well as what pack defines that constant. Each checker can define arbitrary rules for whether the constant is a “violation” or not. For example, the “dependency” checker considers a reference a violation if one pack references another without listing it as an explicit dependency.
How to Get Started
The toolchain here has developed alongside Gusto’s journey to modularize its Rails application. Along the way, because we believe in gradual and incremental improvement, we wanted to make sure we always left a path for new applications to adopt this framework in a way that adds value incrementally. Here’s how I’d go about advising an organization to use these tools to modularize their app.
Step 1: Break up your code into domain-based packs
Before getting into the process of using packwerk or rubocop-packs to improve system boundaries, the simplest way to start is to move code files into packs. Check out the blog post linked in the introduction for a before and after of what this means.
use_packs exposes a helpful CLI –
bin/packs — which developers at Gusto use to create new packs and move files into them.
There is no silver bullet for how your application should be broken up. The best way I’ve found is to sit down with subject matter experts (including product) and make a best first attempt at breaking up your application into smaller domains. Feel free to allow imperfection. Your first attempt at domain boundaries will never be perfect, and that’s okay!
The important thing to remember is that this is easy to change if you get it wrong, since you can always merge your packs back into a mega-pack or move files around again. In Rails, moving a file between packs only changes “autoload paths,” meaning how the constant the file defines is referenced remains unchanged. At Gusto, this means developers can experiment with boundaries inexpensively and with low risk, and they regularly move files around as their understanding of boundaries improves.
Step 2: Find owners for packs
The next step would be to use
code_ownership to create code teams within your application and assign them to packs. Having packs be assigned to owners creates accountability, which is useful for everything from creating appropriate context in automated developer feedback and observability tools to just making sure folks know who they can talk to about changing code they are unfamiliar with.
Step 3: Begin enforcing your pack dependency structure
packwerk can do is enforce the dependencies between your packs. By having each pack specify what it depends on, you can use packwerk to create a
package_todo.yml file, which represents “violations” between packs, such as the use of a pack without a stated dependency. This tool makes it easy to start creating a technically enforced architecture diagram.
Begin using this by setting
true on a couple of packs you feel the most confident about. For example, you might feel very confident that your feature flags or authorization framework should not depend on your domain, but
packwerk may reveal through “dependency violations” that they do. Fix these by aligning the code with the design. Once all dependency violations are complete, you can set
strict . This locks in your progress and prevents architecture regressions.
Fixing dependency violations first helps ensure code is in the right place. Once we feel confident code is in the right place, we can move onto improving the API to that code.
A quick note on
The capabilities in this blog post are shipping in
packwerk major version 3, which besides various bug fixes and performance improvements will also be shipping with the ability to extend checkers (constraints around constant references), validators, output formatters, and more! If you’d like to use packwerk 3 today, as the toolchain already does, you can build off of main in your
gem 'packwerk', github: 'Shopify/packwerk', branch: 'main'
Step 4: Begin enforcing public API boundaries
packwerk-extensions supports the idea of a “privacy” checker. To use this, find packs where you believe should have a clear, well-abstracted API. Set
true in those packs. Run
bin/packwerk update-todo and try to fix most or all of the violations before shipping. This is important so consumers have public API to reach for when they get a privacy violation.
Step 5: Harden the good boundaries
Lastly, once your dependency structure is clean and the APIs between packs make the application simpler to understand, you can harden those boundaries with
rubocop-packs . Cops such as
Packs/RootNamespaceIsPackName ensure that every pack establishes exactly one top-level namespace equal to the pack’s name.
Packs/DocumentedPublicApi ensures your public API is documented!
There’s a lot of detail missing in this post about the problems we are trying to solve, the symptoms of an entangled, monolithic codebase, how to go about determining domain boundaries, technical strategies for refactoring towards better boundaries, management and cultural techniques and changes that need to be made to support this work, and more.
There’s also so much more potential for the
packs ecosystem and for Ruby Packages as a concept in general. At Gusto, our vision is to allow the
packs ecosystem to support gradual modularization to permit packs to be built, tested, and even deployed independently, as determined automatically based on boundary improvement progress. For example at Gusto, we use a feature we call “conditional builds” to run a subset of tests that could possibly fail based on the system package graph.
We’d love for you to try out this ecosystem and provide us feedback or contribute! I’d also love to hear you and your organizations’ approaches to modularization – I’m very supportive of all efforts to improve packaging and modularization capabilities in the Ruby language and Rails framework.
If you’d like to chat more, please reach out to me here or in the Ruby/Rails Modularity Slack Server. I’d be happy to chat more!