I am part of the Partners Engineering team at Gusto, building our product for accountants and bookkeepers. Our team recently decided we could do more to work on experiment-based projects - putting out new features quickly and iterating on them based on customer feedback. We wanted to work on projects defined by outcomes.

In part one of this series, we discussed how to identify an outcome-based experiment and define a realistic minimum lovable product (MLP). In this final part of the series, we will take a closer look at how to actually bring an experiment to life as a full-fledged feature. If you think the hard work is complete once the project is defined, think again! Scope creep is a real and constant danger. Deciding how to iterate on an initial experiment can be easy - if you have the right data. Finally, there is some clean up to do! Fast experimentation can be messy and a responsible engineer should always strive to leave their code in a tidy state.

Start with hardcoding

My coworker Alyssa Hester and I were going to build a way to recommend our trusted accountant partners to Gusto small businesses. We wanted to create custom recommendations between these two parties based on their common characteristics.

However, we needed to keep in mind, the goal of this experiment was to determine if we should move forward with building a recommendation engine. An ideal experiment would test a hypothesis with the minimum possible resources. This meant using a tool that is typically extremely frowned upon in a production environment - hardcoding.

This experiment would be heavily hardcoded and purposefully not scalable. We would launch it to a select group of small businesses. For each of these small businesses, our data science team would use an algorithm to manually generate a recommended accountant partner. This was a starting point from which we could determine if it was worth the engineer time to build out a full-featured recommendation engine.

With our hardcoded MLP, it was time to get to work and start building! However, as soon as we dove into the project we started to come up with new ideas that could potentially improve our outcomes.

“No” is my new favorite word

Throughout the project, our team came up with many wonderful ideas that would have undoubtedly increased the success of the project. There was only one problem - we didn’t have time to implement any of them.

Every time we encountered a new idea that would improve our success, we needed to make a decision on whether to make time for a potential additional feature. To decide we asked ourselves, “Is there an easier way to do this while still bringing value to the customer? Do we need this new feature to understand if customers will use the project?”

Some great ideas that we had to say “no” to:

Idea: Providing more than one recommendation to the accounting firm
Instead: We referred companies to our Partner Directory

Idea: Providing detailed data on any connections made to our sales team via Salesforce
Instead: CC’d our sales team on the initial email sent when making a connection between a company and a Partner

Just because a realistic MLP has been defined does not mean the project is in the clear. Scope creep is a real danger that should be fought at all times. The ultimate goal of the project should be kept in sight - building a feature that would increase accountant client adds.

Collect all the data

The point of this experiment was to determine if we should invest in a full-featured recommendation engine. Experiments mean data. All of this work would have been pointless without collecting the necessary data to determine if future investment was worthwhile.

We tracked every possible data point - views, clicks, connections. Anything that could be measured was measured! From this data we created a dashboard with all relevant information. It was this crucial information that allowed us to iterate on the experiment.

Iterate on the experiment

We launched our hardcoded experiment with a few days to spare! We sat back and waited for the results to roll in, tracking view and click through rates for companies viewing their recommended accounting firms. After a few months we determined that small businesses were very interested in viewing their custom recommendation for an accounting firm. However, based on performance data, the placement of our prompt to view this recommendation and the flow for connecting with the recommended accountant could be improved.

We took these learnings and launched a second experiment, placing the recommendation prompt in a new location with an increased number of custom recommendations. It only took us a few days to build upon the existing experiment. With round two, the interest rate in recommendations improved further. After two rounds of experimentation and positive feedback from both small businesses and Gusto Partners, we decided it was worth the effort to build a fully scalable recommendation engine.

We learned enough from each experiment to justify continued investment. After the second iteration, we decided it was time to create our full-fledged recommendation engine. Building the recommendation engine took significantly less effort than it would have without experimentation, as we were able to build on existing iterations. We are still waiting for data to bake on our recommendation engine in order to determine if we achieved the goals from our defined outcomes. In any case, we are regularly connecting small businesses on Gusto with our accountant partners, and initial results are promising!

Clean up

Engineers may be asking, with all this focus on outcomes and fast delivery, was there any impact to code quality? The answer is a hard yes. We purposefully decided to include non-scalable code into our codebase since this allowed us to build within the given timeline. While this was a necessary sacrifice, it certainly was not something I would want a coworker to git blame and attribute to my name a year from now.

As an engineer, you are responsible for the cleanliness of your codebase. I would not consider this project complete until we cleaned up unused code after the experiment was turned off. After building the full-featured recommendation engine, Alyssa and I deleted unused code and ensured we left our codebase in a tidy state.

Keep going with outcome-based experiments

After going through two experiments and finally launching the full-featured recommendation engine, we are happy with the outcomes through this process.

We spent minimal engineering, product, and design time shipping a minimum lovable product within only six weeks. Once the MLP was launched our customers in the experiment were able to benefit from these accounting firm recommendations within six weeks, instead of months. As an engineering team, we then iterated based on real customer feedback and data collected from our users. Finally, once we had gone through two experiment iterations, we were able to use what we learned and some existing code to speed up development of our full-featured recommendation engine.

While we are still tracking final metrics for outcomes from our recommendation engine, the early signs are promising. Overall, this outcome-based experiment allowed us to help both our accountant partners and our small businesses. Our team hopes to run many more of these experiments in the future.