Having an effective monorepo at scale will also require an entire infrastructure to solve all the problems that a poly-repo must solve and more. In particular,
- Partial download, as a monorepo will quickly grow too large for a single person to download. This is trivial for poly-repo but requires dedicated system for monorepo.
- Dependency management. With a decently sized monorepo, one can't compile everything and test everything. So, someone needs to build a dependency manager to track all the DAGs, and build only the DAGs that are impacted by a commit. One also has to build a trackign mechanism for deploying different build artifacts because a team may deploy all the build artifacts in different date and time. We will need more sophisticated build tools too.
- Build infrastructure. Even with a perfect dependency-tracking system, we may still end up building large-enough source code that we need to build the code in parallel.
- Directory-level access control. This is also trivial for poly-repo since the granularity is at repo-level, but it requires dedicated implementation for a mono-repo.
I'm not sure if the marginal benefit of having a monorepo can justify the investment for most of the companies. Google created monorepo initially to manage the dependencies of C++ code, and Perforce already supported partial downloads. But with more modern languages that have their own way of dependency management? I'm not so sure about the benefits. Making refactoring easier? How many repos are really shared at source level across multiple teams in a company? Encouraging sharing source and therefore knowledge? Isn't it a solved problem? Any decent company allows searching source code at semantic level across multiiple repos. If I want to see the source code of a particular package in my IDE, it's just a click away. Note I'm emphasizing marginal return of monorepo. Case in point, Google maintains the very use Guava library, which is probably used by millions of engineers. Does it lead to pains of incompatibility errors at runtime across different releases? Absolutely. Is it worth changing my poly-repo to monorepo to solve the problem? I highly doubt so. The compatibility issue happens rarely given good testing setup. When I do need to migrate my code, the cost is bi-modal: either the refactoring is trivial, or it requires serious testing and design changes, which a monrepo will not help anyway.
Note I'm not saying that monorepo is not useful. Instead, I question how many companies will benefit from switching to monorepo, which may lead to the discussion on the potential market share of Diversion.
Polyrepo is such a pain in the ass though. At a smaller scale it's much, much nicer, and then when we get big enough to hit all those scaling problems, we'll be able to afford it by hiring a team of 3 to go implement Bazel/Buck2/..., and perhaps switching from Git to Diversion.
It’s not fully open source yet (ie you can’t use it yourself I think) but Facebook’s EdenFS project solves the partial checkout problem.
The queries in Bazel/Buck to figure out the changed set of dependencies probably isn’t complicated and that’s why there’s no turnkey solution? You do need to adopt a build system with precise dependency tracking (afaik only Buck and Bazel support that) or the monorepo path isn’t going to be very successful.
- Partial download, as a monorepo will quickly grow too large for a single person to download. This is trivial for poly-repo but requires dedicated system for monorepo.
- Dependency management. With a decently sized monorepo, one can't compile everything and test everything. So, someone needs to build a dependency manager to track all the DAGs, and build only the DAGs that are impacted by a commit. One also has to build a trackign mechanism for deploying different build artifacts because a team may deploy all the build artifacts in different date and time. We will need more sophisticated build tools too.
- Build infrastructure. Even with a perfect dependency-tracking system, we may still end up building large-enough source code that we need to build the code in parallel.
- Directory-level access control. This is also trivial for poly-repo since the granularity is at repo-level, but it requires dedicated implementation for a mono-repo.
I'm not sure if the marginal benefit of having a monorepo can justify the investment for most of the companies. Google created monorepo initially to manage the dependencies of C++ code, and Perforce already supported partial downloads. But with more modern languages that have their own way of dependency management? I'm not so sure about the benefits. Making refactoring easier? How many repos are really shared at source level across multiple teams in a company? Encouraging sharing source and therefore knowledge? Isn't it a solved problem? Any decent company allows searching source code at semantic level across multiiple repos. If I want to see the source code of a particular package in my IDE, it's just a click away. Note I'm emphasizing marginal return of monorepo. Case in point, Google maintains the very use Guava library, which is probably used by millions of engineers. Does it lead to pains of incompatibility errors at runtime across different releases? Absolutely. Is it worth changing my poly-repo to monorepo to solve the problem? I highly doubt so. The compatibility issue happens rarely given good testing setup. When I do need to migrate my code, the cost is bi-modal: either the refactoring is trivial, or it requires serious testing and design changes, which a monrepo will not help anyway.
Note I'm not saying that monorepo is not useful. Instead, I question how many companies will benefit from switching to monorepo, which may lead to the discussion on the potential market share of Diversion.