Three people standing next to a giant credit card
Illustration from https://undraw.co

In Rails applications, we typically wrap database changes that are required to succeed as a single atomic action in an ActiveRecord::Transaction. However, transactions can sometimes include actions that are not SQL statements. When these actions are network calls, notifications, or async processes, they can yield results that are difficult to rollback or worse introduce bugs that are tricky to identify. In this post, we'll look at what problems these non-atomic actions pose and walk through a few examples of how to fix them. We'll also cover isolator, a gem that helps us detect these non-atomic violations.

A tale of a transaction

Let's take a look at the classic example of Bob sending money to Alice.

def create_transfer!(sender:, receiver:, amount:)
  ActiveRecord::Transaction do
    transfer = Transfer.create!(sender: sender, receiver: receiver, amount: amount)
    AccountService.withdraw!(sender, transfer)
    AccountService.deposit!(receiver, transfer)
  end
end

create_transfer!(sender: bob, receiver: alice, amount: 500)

This method creates a Transfer record and updates the sender's and receiver's accounts. It makes sense to wrap these operations in a transaction. If we fail to deposit money into Alice's account, then we should not withdraw money from Bob's or create an unprocessed Transfer record. But let's dive into what's going on in the withdraw and deposit methods.

class AccountService
  def self.withdraw!(account, transfer)
    account.update!(account.balance - transfer.amount)

    BankWithdrawJob.perform_async(transfer.id)
    AccountMailer.update_account_balance_for_sender(account.id).deliver_now
  end

  def self.deposit!(account, transfer)
    account.update!(account.balance + transfer.amount)

    BankDepositJob.perform_async(transfer.id)
    AccountMailer.update_account_balance_for_receiver(account.id).deliver_now
  end
end

Oh no, these methods don't simply update ActiveRecords! They also queue up background jobs and send emails. If we fail to update Alice's balance, then all db changes in the transaction are rolled back as expected. However, our BankWithdrawJob is already queued up and the email to Bob cannot be unsent. In this example, the job will fail because it won't be able to find the transfer, which is a better outcome than actually withdrawing money from Bob's account. It’s still not a pleasant experience for Bob who received a notification that didn’t match his account records.

Moreover, these non-atomic actions can also interfere with the db calls and increase the chances of failures. If an operation, such as a network call or the generation of a huge file, takes a really long time to complete then it'll keep your db transaction open for that duration. If an error is raised when executing non-atomic actions, then the error will trigger the transaction to rollback perfectly valid db changes.

But the troubles don't end there. We observed that background jobs queued inside transactions can sometimes be run before the transaction completes. This can either lead to job failures or produce undesirable results because the records aren't in the state we expect them to be. If you find instances of background jobs in your system that sometimes fail and subsequently succeed on retry without other transient behaviors at play, the culprit may be this race condition introduced by jobs enqueued before the transaction committed.

Fixing non-atomic violations

The most obvious approach is moving them out of the transaction:

def create_transfer!(sender:, receiver:, amount:)
  ActiveRecord::Transaction do
    transfer = Transfer.create!(sender: sender, receiver: receiver, amount: amount)
    sender.update!(balance: sender.balance - amount)
    receiver.update!(balance: sender.balance + amount)
  end

  perform_post_transfer_tasks(transfer, sender, receiver)
end

# All non-atomic ops are moved into this method
def perform_post_transfer_tasks(transfer, sender, receiver)
  BankWithdrawJob.perform_async(transfer.id)
  BankDepositJob.perform_async(transfer.id)
  AccountMailer.update_account_balance_for_sender(sender.id).deliver_now
  AccountMailer.update_account_balance_for_receiver(receiver.id).deliver_now
end

However, sometimes it may not be desirable to structure your code in such a way. Additionally, your method may be nested in another parent transaction. Suppose the transfer is just a step in an order fulfillment transaction as follows:

ActiveRecord::Transaction.do
  order = Order.create!(buyer: bob, seller: alice, product: product, qty: qty)
  create_transfer!(sender: bob, receiver: alice, amount: product.price * order.qty)
  order.fulfill!
end

It could be transactions all the way up! Our problem would be solved if we could leave the non-atomic code inside the transaction but execute them after the transaction. We know that ActiveRecord models have after_commit hooks that are executed after the transaction completes. Wouldn't it be nice if the same behavior exists for explicit transactions? That's exactly the solution provided by the gem after_commit_everywhere, which allows us to safely co-locate non-atomic code inside transactions by wrapping them in after_commit blocks as follows:

class AccountService
  def self.withdraw!(account, transfer)
    account.update!(account.balance - transfer.amount)

    AfterCommitEverywhere.after_commit do
      BankWithdrawJob.perform_async(transfer.id)
      AccountMailer.update_account_balance_for_sender(account.id).deliver_now
    end
  end

  def self.deposit!(account, transfer)
    account.update!(account.balance + transfer.amount)

    AfterCommitEverywhere.after_commit do
      BankDepositJob.perform_async(transfer.id)
      AccountMailer.update_account_balance_for_receiver(account.id).deliver_now
    end
  end
end

So far, we've been dealing only with explicit transactions. What about implicit transactions, those that are parts of Transfer.create! and account.update! calls? Suppose we have these callbacks in the Transfer model and they shouldn't be executed as part of the transaction:

class Transfer < ApplicationRecord
  after_create :notify_admin_for_review
  after_save :send_data_for_risk_analysis
end

We know that after_create and after_save are executed before the transaction committed, so simply move them to after_commit hooks instead:

class Transfer < ApplicationRecord
  after_create_commit :notify_admin_for_review
  after_commit :send_data_for_risk_analysis, on: [:create, :update]
end

As we've seen in this section, the fixes aren't complicated once we identify the problems. But how do we reliably identify the violations in the first place? Surely, we can't rely on developers to consistently spot them during development or worse as bugs in production.

Isolator

We set out to search for a solution to automate the task of detecting these non-atomic actions and quickly found the isolator gem. Isolator works by tracking whether we're in a transaction and raises an error when a non-atomic action is invoked. The gem comes with several default adapters that can detect http calls, mailers, and background jobs (sidekiq and resque are also supported). We can also add custom adapters to support other custom actions.

Internally, isolator keeps a transaction count for each database connection. It monitors every SQL statement to detect a beginning or end of a transaction and increments/decrements the count accordingly. Because of this overhead, it's not recommended to use isolator in production. At Gusto, we enable isolator in the test environment and rely on our high test coverage to surface non-atomic violations in CI.

Once installed, isolator will surface non-atomic violations if a test invokes the code path that contains the problem. Here's a sample output of a violation that occurs when a background job is enqueued inside a transaction:

Isolator::BackgroundJobError: You are trying to enqueue a background job inside db transaction. In case of transaction failure, this may lead to data inconsistency and unexpected bugs
Example output for an Isolator::BackgroundJobError

Isolator is pretty smart about detecting dependencies and automatically configures the relevant adapters when it initializes. For this reason, we install it with the require: false option and load it as part of spec support when all core dependencies are already loaded.

group(:test) do
  gem 'isolator', require: false
end

In a large code base, you may find it overwhelming to fix all violations at once. A strategy that works really well for us is to deal with violations one category at a time. By disabling all adapters and re-enabling one by one we get to control when to apply the fixes. For instance, here's how to disable the http adapter for the entire test suite.

# spec/support/isolator.rb
require 'isolator'
Isolator.adapters.http.disable!

Summary

In this post, we learned about the problems non-atomic actions can cause to your application. We also covered tools such as isolator and after_commit_everywhere that help us detect and fix the problems. These tools have helped us remove numerous transactional violations and continue to safeguard our app against non-atomic operations. We hope the information in this post will also help you keep your app transactionally safe.