published by | Adam Stepinski |
---|---|
in blog | Instawork Engineering |
original entry | Tackling the Monolith/Microservices Dilemma at Instawork |
Instawork’s main product is implemented as a monolithic codebase using the Django framework. This choice has served us well from the founding of the company, and it exemplifies our guiding principle of “Respecting the Craft”:
Develop deep expertise with a limited set of tools, and pick technologies carefully to solve a particular problem
By sticking with a monolith, engineers at Instawork need to develop expertise with only one language (Python) and one set of conventions (Django). With that set of tools, our team is able to work on any feature across the entire product. This flexibility has helped us stay nimble and pivot engineering effort to the most important initiatives at the company. Additionally, a monolith greatly simplifies our CI/deployment process and production monitoring setup. This means more engineering effort can go to developing the product, rather than to maintaining a complex infrastructure (such as Kubernetes).
Every engineering decision comes with tradeoffs. Common criticisms of monoliths include:
These problems become more acute as the size of the monolith grows. Indeed, over the last year we started noticing early signs of issues in our Django codebase: spaghetti code and tight coupling between modules. We decided to proactively address the problem before it crippled our development velocity.
Microservices are commonly seen as a solution to the problems of a growing monolith. But I’d argue it’s rarely the right choice to jump from a monolith straight to microservices. It’s true that each individual microservice will be easier to scale, test, and deploy than the monolith it came from. However, that comes at the price of a more complex orchestration needed across all services. Individual developers pay this price when figuring out which set of services they need to run locally to test their feature end-to-end. And the company feels the pain when it needs to hire dedicated engineers to maintain and monitor a complex infrastructure in production.
Most importantly, moving to microservices doesn’t necessarily make the overall system more modular. It’s still possible to have tight coupling between services. Just because there’s a network boundary between two pieces of code, doesn’t automatically make them loosely coupled. It’s all too easy to end up with a distributed monolith: all of the pain of spaghetti code, with the added fun of network latency/errors and reasoning about a distributed system.
Rather than jumping from a monolith straight into microservices, it makes more sense to refactor and modularize the monolith first. Even without a network boundary between modules, it’s possible to write loosely-coupled code that communicates over clean, well-defined interfaces that operate on simple data types. This approach is called the modular monolith, and I think it has the best attributes of microservices and monoliths:
Additionally, a modular monolith is a great intermediate milestone for potentially breaking out modules into separate services. By first establishing a clean interface between modules, the process of pulling out the code and switching the interface to network calls becomes much easier. And the decision to pull out a service can be based on reasons other than improving modularity (such as resource utilization or team organization).
We made the decision to evolve our monolith into a modular monolith. Unfortunately, the Django framework encourages tight coupling between database models, serializers, and views. While that coupling is great to build features quickly, it was now a hindrance to our modularization efforts. To encourage a new style of development, our platform team introduced Modularization Guidelines, an internal document of best practices to write code in the modular monolith style. The doc covered:
We evangelized these guidelines throughout the team and tracked adoption. And indeed, new code was being written in a more modular fashion. But we soon realized that a set of guidelines was not enough.
Guidelines were not enough, we needed enforcement and insights too.
Enter Tach, an open-source tool to enforce modularity in large codebases. Tach is developed by Gauge, whose mission is to untangle tightly coupled monoliths. Getting started with Tach is easy: simply pip install tach, and run tach mod to mark the modules in your codebase. This automatically generates a config, which you can sync to your existing dependency state with tach sync.
[[modules]]
path = "apps.pro"
depends_on = [
{ path = "apps.booking" },
{ path = "apps.content" },
{ path = "apps.pricing" },
{ path = "apps.shift_requirements" },
{ path = "backend" },
]
Above is an example of a module in our monolith. This configuration in tach.toml means that adding an undeclared dependency to the apps.pro module will be flagged as an error.
Additionally, Tach lets us define interface rules that reflect our modularization guidelines:
[[interfaces]]
expose = [
"services.*",
"selectors.*",
"constants.*",
".*dataclasses.*",
".*types.*",
]
This example shows which members can be imported from a module. Services and selectors refer to the public methods of a module, while constants, dataclasses and types represent the public data types used in those methods. If an engineer tries to import something else, e.g. a Database Model, into another module, Tach will automatically throw an error.
Some things we’ve appreciated about Tach so far:
In addition to using Tach locally, we’ve been using a new service from Gauge that integrates these checks across the entire team. Their platform gives us automatic CI checks in GitHub, a web UI that shows modularity violations with proposed fixes, and provides key insights with a modularity score tracked over time.
The team has continued to ship updates that more effectively surface the insights we need. As we dial in our enforcement configuration, we expect the platform to function as a guide toward low-hanging fruit for active refactoring. In particular, we anticipate integrating our codemod-based approach to refactors with the insights from Gauge in the future.
We’ve only been using Tach and Gauge’s web platform for a short time, but the level of enforcement and insights has been night and day for our modularization efforts:
Now that the team has exposure to Tach, we’re making the CI status checks required. This will ensure that all code changes are improving our modularity, rather than making the problem worse. We expect to see significant improvements to developer velocity and platform reliability as we embrace the modular monolith. Stay tuned for updates on our progress!
Tackling the Monolith/Microservices Dilemma at Instawork was originally published in Instawork Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.