Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Cloud-Native Git Alternative

Not sure if this is a good summary of the product. For one, cloud-native is an implementation detail, unless the company plans to sell the new VCS as packaged software instead of service. For two, I'm not sure how being cloud-native addresses any issue with my daily interaction with Git.

> The biggest drawback of Git is its limited scalability

I wonder how many people really has this problem. Millions of people have been using github and gitlab. I'm curious about the percentage of users who feel that there is a scalability issue with their own repositories. Personally, I don't have any beef with git's scalability at all, even though the companies I worked for had anywhere between hundreds to tens of thousands of engineers. Maybe having a monorepo will lead to scalability problem? But monorepo is a debatable topic, to say the least.

> Diversion is built on top of distributed storage and databases, accessible via REST API, and runs on serverless cloud infrastructure. Every repository operation is an API call (commit, branch, merge etc.). The desktop client synchronizes all work in progress to the cloud in real time

Again, how does this have to do with me, a user? Why would I care about the underlying protocols when I simply use a CLI or a UI?



Monorepos are indeed causing problems with git, and this is one of the main arguments against them (see [1]). Some companies are building their own solutions (Google, Meta), and some are splitting their monorepos because of these problems. IMO if a company wants to run a monorepo for their reasons, they shouldn't be limited by their VCS.

The technical details are for the readers who want to know, I agree it's not really important for the users (most of them, at least).

[1] https://medium.com/@mattklein123/monorepos-please-dont-e9a27...


Having an effective monorepo at scale will also require an entire infrastructure to solve all the problems that a poly-repo must solve and more. In particular,

- Partial download, as a monorepo will quickly grow too large for a single person to download. This is trivial for poly-repo but requires dedicated system for monorepo.

- Dependency management. With a decently sized monorepo, one can't compile everything and test everything. So, someone needs to build a dependency manager to track all the DAGs, and build only the DAGs that are impacted by a commit. One also has to build a trackign mechanism for deploying different build artifacts because a team may deploy all the build artifacts in different date and time. We will need more sophisticated build tools too.

- Build infrastructure. Even with a perfect dependency-tracking system, we may still end up building large-enough source code that we need to build the code in parallel.

- Directory-level access control. This is also trivial for poly-repo since the granularity is at repo-level, but it requires dedicated implementation for a mono-repo.

I'm not sure if the marginal benefit of having a monorepo can justify the investment for most of the companies. Google created monorepo initially to manage the dependencies of C++ code, and Perforce already supported partial downloads. But with more modern languages that have their own way of dependency management? I'm not so sure about the benefits. Making refactoring easier? How many repos are really shared at source level across multiple teams in a company? Encouraging sharing source and therefore knowledge? Isn't it a solved problem? Any decent company allows searching source code at semantic level across multiiple repos. If I want to see the source code of a particular package in my IDE, it's just a click away. Note I'm emphasizing marginal return of monorepo. Case in point, Google maintains the very use Guava library, which is probably used by millions of engineers. Does it lead to pains of incompatibility errors at runtime across different releases? Absolutely. Is it worth changing my poly-repo to monorepo to solve the problem? I highly doubt so. The compatibility issue happens rarely given good testing setup. When I do need to migrate my code, the cost is bi-modal: either the refactoring is trivial, or it requires serious testing and design changes, which a monrepo will not help anyway.

Note I'm not saying that monorepo is not useful. Instead, I question how many companies will benefit from switching to monorepo, which may lead to the discussion on the potential market share of Diversion.


Polyrepo is such a pain in the ass though. At a smaller scale it's much, much nicer, and then when we get big enough to hit all those scaling problems, we'll be able to afford it by hiring a team of 3 to go implement Bazel/Buck2/..., and perhaps switching from Git to Diversion.


It’s not fully open source yet (ie you can’t use it yourself I think) but Facebook’s EdenFS project solves the partial checkout problem.

The queries in Bazel/Buck to figure out the changed set of dependencies probably isn’t complicated and that’s why there’s no turnkey solution? You do need to adopt a build system with precise dependency tracking (afaik only Buck and Bazel support that) or the monorepo path isn’t going to be very successful.


> Monorepos are indeed causing problems with git, and this is one of the main arguments against them (see [1]).

The article is a disappointing read. It spends a lot of time talking about monorepos and how they spell all sorts of trouble. Yet, the article makes zero mentions of submodules as a way to get the best of both worlds.


Submodules are great, but they're hardly an alternative to monorepos.


ive never seen positive feedback for submodules before


Why not? Just want to understand.


Submodule workflows have a lot of overhead at review time. During development it's fine, you work with the fully materialized tree just like it's a monorepo. But once you need to submit your changes for review, how does that workflow look?

1. Commit in submodule A, then get it reviewed and merged as SHA 123

2. Update submodule A to 123, get it reviewed

3. Reviewer has feedback on usage of new API in submodule A

4. Make another PR on A, at commit 457. This time don't merge it since reviewer on main repo might have more feedback.

Monorepo:

1. Make PR to monorepo

2. Get review feedback

3. Push changes to PR branch

4. Merge

5. Update submodule to 456, push to existing PR

...??


> But once you need to submit your changes for review, how does that workflow look?

1. Post PR to submoduke A. Get it merged.

2. Post PR to the main repo updating it to point to subproject A.

Done.

The only difference between a monorepo and splitting the repo into submodules is that the main repo's history is coarser and basically tracks the output of integration tests. There is no need to overcomplicate things, and if you need to overthink them anyway then you have far more degrees of freedom to worry about in monorepos.


That’s a really slow review process. It also prevents reviewers from seeing the bigger picture of how step 1 manifests in step 2. In practice what I’ve seen you end up with both reviews simultaneously referencing each other in the description and once approved you merge 1 and update the pointer in 2 to point to the new merged commit if it changed.

That’s a lot of annoying and sometimes error prone manual bookkeeping that has nothing to do with the engineering work itself


Anything that cuts across submodule boundaries needs as many MRs as boundaries it crosses, conflicting submodule pointer updates in the main require additional MRs (in the submodules) to resolve and coordination between those MRs.

They're basically fine for slowly-moving dependencies, vendoring, etc. but they emphatically do not solve the large-org many-team coordination problems that monorepos are meant to solve.

FWIW, git is a great monorepo platform for 1-10m lines of code (Linux, $MY_JOB, ...). It's only the very largest scales (Windows, Google3, ...) or asset heavy cases (ML, game dev) that need special treatment.


Monorepos are a problem born of CI which can't cope with cross project dependencies properly. People have solved the problem by pumping everything together, but it's the wrong answer.

Fix CI and the problem goes away.


Sorry no. CI and monorepos are at best tangentially related. Dependency tracking across repos is a PITA which inhibits code reuse - git sub modules suck as do whatever that alternative git submodule concept is called (subtree?).

Code repos like Cargo and NPM can help but even still it’s an annoying dance to update dependencies in multiple downstream projects. And if there’s a code change you need to make, it’s a 3-way orchestration of new api, update downstream dependencies, remove old api.


That's exactly my point. Cross dependencies is exactly the problem I want CI to solve.


Like the CI system automatically push code commits updating downstream dependencies that reference the upstream repo? Or something else?


Re: scalability, in the very first sentence they mention game development, which deals with large quantities of large (and growing), nowadays versioned assets like 3D models, textures, animations, etc.


As someone who worked on the backend (workflow, infra) side of a game dev studio, there are a lot of massive benefits I see with this sort of "what if Dropbox but Git" product workflow.

We couldn't actually use git for our asset management, because when you're dealing with 1GB+ Photoshop files, versioning them with any reasonable granularity breaks your Git repository, makes clone times and local file storage requirements astronomical, and doesn't really make sense anyway. We ended up using SVN, since it only transfers what you check out and you can check out subtrees trivially, but then that required getting a GUI SVN client, providing it to our art team, teaching them how to use it, and then having them come to me whenever something in SVN got confused or broken (e.g. they opened and then closed a document and Photoshop updated the thumbnail, now there's a merge conflict and they can't commit).

We also ended up using Google Drive for a lot of stuff, and eventually migrating to Team Drives once that was a thing, but that doesn't integrate with... basically anything, honestly, or at least not with any reasonable degree of straightforwardness.

I don't work for that company anymore, but the thing that would make me most interested in this product would be:

1. Self-hosting it (would pay 'enterprise' rates for this); or

2. Being able to locally proxy/cache assets for users in the office, so that committing a 1 GB PSD didn't require 20 artists to all pull down 1 GB each from the server

A lot of people seem to be comparing this to actual Git, but this doesn't replace Git unless you're using Git wrong; what it replaces is the absolute disaster of a workflow that a lot of companies have to try to build/use/teach internally.


Ah, I guess this is the curse of ignorance: I saw the sentence but didn't register its significance as I'm not familiar with what's required in game development.


[flagged]


Yeah, possibly. I have only visibility to the teams I worked with. That's partly why product market fit is hard to find, as it relies heavily on intuition. I'd be happy if I'm wrong.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: