Rails biolith architecture

Building Rails at Scale: Inside Gusto’s Biolith Architecture

When most startups outgrow their Rails monolith, they face a tough decision: stick with the monolith or break it apart. Gusto took a unique approachโ€”they built what they call a “biolith,” two separate Rails monoliths serving different parts of their business. With over 600 engineers working across these applications, their journey offers valuable insights for any team scaling Rails.

What is a Biolith?

Rather than having one monolith, Gusto operates with two distinct monoliths: approximately 80% of their engineers, code, and product sit in one monolith, while just under 20% exists in a separate application called Hawaiian Ice, which handles all benefits-related functionality.

This architectural decision wasn’t initially strategicโ€”it was driven by HIPAA compliance requirements, as all personal health information (PHI) needed to be stored separately from the main application. However, this compliance-driven split turned out to be one of their luckiest architectural decisions.

The Benefits of Separation

Peter Compernolle, architectural lead for Gusto’s benefits domain, explains the advantages: working with a smaller application makes it significantly easier to move quickly compared to a very large application.

The benefits application operates behind a firewall with strict access controls. All benefits-related admin functionality is inaccessible from the outside world by default, requiring explicit API design to share any data. This security-first approach forces intentional architectural decisions.

Challenges of the Main Monolith

Miguel Conde, who leads the effort to extract the time product from Gusto’s main monolith, paints a different picture of working in the larger codebase. The main monolith is approximately double the size of the benefits application, and changes must be made carefully due to extensive coupling between services in this 11-12 year old application.

The deployment pipeline must build the entire monolith and run all test suites before going live, and if there’s an incident, deploys are blocked across the entire application even if a specific service isn’t affected.

At the end of the “Challenges of the Main Monolith” section, add: “These challenges are common when scaling Rails applications for enterprise use, particularly for SaaS platforms managing multiple tenants and high transaction volumes.

The Extraction Strategy

To address these pain points, Gusto is extracting certain domains from the main monolith. They identified scheduling, time tracking, and time off as three distinct systems that can operate independently of each other, though they have intertwined communications.

The key principle for defining boundaries? If code is needed immediately for a system to operate and cannot support eventual consistency, it’s probably an integral part of that system and should be included.

Packwerk: Enforcing Boundaries

Packwerk has become essential for managing these extractions. The tool helps isolate code within the monolith by preventing calls to models that don’t belong to a service and blocking access to private methods.

During extraction, Packwerk identifies public Ruby APIs that need transformation into either Kafka events or GraphQL endpoints by preventing any outbound calls from a pack.

The Game-Changer: Sorbet + Packwerk

While Peter initially dissented against using Packwerk, preferring the technical constraints of gems over tooling-level constraints, the introduction of Sorbet changed everything, making Packwerk at least ten times more valuable.

Sorbet requires defining types in method signatures, forcing developers to declare what classes they’re usingโ€”classes that Packwerk can then validate against defined boundaries.

The combination helped them identify hotspots in their codebase, like bulldozing big hills from a mountain of dirt to see where there was the most signal.

Making Active Record Models Private

One of Gusto’s most important patterns involves making Active Record models private by default. By default, an Active Record model has approximately 350 methods, not including specific accessors added per column, so reducing this footprint has been a major effort.

For new models especially, they make them private by convention, typically suffixing the model name with “record” (like “policy_record”) and making it private to its owning namespace.

They’ve even open-sourced a tool called Explicit Active Record that raises an error if you try to write to a record in any way, requiring explicit permission through a special method call.

The Service-Repository Pattern

Gusto’s time team has adopted a service-repository pattern that differs from typical Rails conventions. The repository layer (Active Record) handles database interaction, while the service layer contains all business logic and external dependency interactions.

Models are kept as simple as possible with minimal helper functions, while tightly controlled service APIs expose exactly what functionality is available to other systems.

Miguel emphasizes: While many teams put complex logic in models to guarantee validations, their team keeps models as protected classes requiring review for any changes, relying on Sorbet and Packwerk to detect violations.

After the paragraph explaining their service layer approach, add: “This pattern aligns with modern Rails architecture principles that prioritize clean separation of concerns. Teams looking to implement similar patterns can benefit from understanding service-repository architecture approaches that extract complex business logic into focused, testable components.

Embracing Eventual Consistency

Working with distributed systems requires accepting eventual consistency. Gusto uses an eventing infrastructure (being replaced by Kafka) where services emit events when changes occur, and interested consumers update their tables as needed, though there’s no guarantee when messages arrive.

The after_commit_everywhere gem has been crucial for this approach. It checks if code is in a transactional state and queues operations to execute only after a full transaction commit, preventing events from being emitted for rolled-back transactions.

Temporal Data Challenges

Both sides of the biolith deal with complex temporal data requirements. In the time domain, all tables are temporal in nature, and some are bitemporalโ€”tracking both what the data was at a given point and what the system thought it was at that point.

This enables powerful auditing capabilities. They can answer questions like whether someone retroactively changed a timesheet by maintaining both effective start/end dates and version start/end dates.

Local Development at Scale

With 600+ engineers, local development becomes challenging. Gusto uses two primary approaches:

Demo Scenarios: Writing code to build entire universes for verifying or demoing features in ephemeral environments.

Company Pull: A system that scrubs production data of all sensitive information while preserving the shape of the data, allowing developers to work with realistic data locally.

Anytime a column is added, CI requires defining whether it’s sensitive and how to sanitize it, with all changes reviewed by legal and compliance teams.

The GraphQL Trade-off

Gusto uses federated GraphQL extensively, which brings both benefits and challenges. GraphQL makes it easy for implementers to not think about where data comes from, but it also makes it easy to avoid thinking about error handling and eventual consistency.

Peter offers a measured perspective: As services split up and pages depend on five different services, they’ll need to think carefully about which combinations of services being up or down they can tolerateโ€”a problem GraphQL currently obscures.

Testing Strategy

Gusto maintains three levels of testing:

  1. Unit tests: For edge cases and basic functionality validation
  2. Integration tests: Primarily testing the GraphQL layer to ensure services are called correctly
  3. End-to-end tests: Smoke tests with headless browsers for critical user flows

However, extraction is forcing a rethinking of this strategy, particularly for integration and end-to-end tests that assumed monolithic architecture and used factory bots across the entire codebase.

Key Principles for Rails at Scale

Several themes emerge from Gusto’s experience:

Make Everything Explicit: Rather than relying on Rails’ hidden functionality, they strive to make all dependencies explicit and visible, especially at their scale where implicit behavior becomes difficult to control.

Limit Meta-programming: They forbid most meta-programming except what’s tucked away in gems with clear APIs, and they make Active Record models private to harness power while protecting themselves.

Think in Services: As Miguel notes, Sorbet unlocks object-oriented programming within Rails, making it easier to use encapsulation and inheritance effectively.

Organizations facing similar architectural decisions often benefit from expert Rails consulting guidance to navigate the complexities of modular design, database optimization, and security frameworks at enterprise scale.

The Formula One Metaphor

Miguel offers an apt metaphor for Rails at scale: Rails is like a Formula 1 carโ€”lightweight and fast with no baggage, but also lacking safety features, requiring the driver (developer) to control it well and follow guidelines to avoid crashing.

As the app increases in complexity, it’s like adding weight to the Formula 1 carโ€”speed is weighed down by complexity, and maneuverability becomes difficult because turning might shift many interconnected pieces.

While Rails provides rapid development capabilities through convention over configuration, maintaining that speed at scale requires deliberate architectural decisions and disciplined engineering practices.

Conclusion

Gusto’s biolith architecture demonstrates that there’s no one-size-fits-all approach to scaling Rails. Their journey shows the importance of:

  • Defining clear service boundaries based on operational independence
  • Using tools like Packwerk and Sorbet to enforce architectural decisions
  • Making implicit Rails behavior explicit through service layers
  • Embracing eventual consistency across distributed systems
  • Maintaining developer productivity through thoughtful local development tools

For teams facing similar scaling challenges, Gusto’s experience proves that Rails can work at significant scaleโ€”but success requires thoughtful architectural patterns, disciplined engineering practices, and the right tooling to enforce boundaries that Ruby doesn’t provide by default.Retry

Similar Posts