Hi, one of the maintainers of Git LFS here. I'm also a Git contributor, and I don't think it's a secret or in any way shocking to most of the Git contributors that the intended purpose of the smudge/clean filter functionality in Git was to perform simpler modifications on source code.
However, one of the benefits to Git is that it is enormously flexible, and Git can be and is successfully leveraged to provide large file functionality using this mechanism. Before I maintained Git LFS or knew how it worked, I hypothesized that this would be the ideal mechanism to do handling of large files, so it shouldn't be surprising that the original developers decided to do so. We just need to be cognizant that as with any design, there are going to be some limitations, which is what I was mentioning in that thread.
As a side note, it's intentional that we don't use hard links or symlinks into the LFS storage because that makes it extremely easy to destroy or corrupt data by modifying the working tree, so Git's behavior here is actually helpful. There is copy-on-write functionality in Git LFS that can be used if your file system supports it to make the size of the repository a little less painful with large files.
Would love to hear some insights from game developers or others dealing with many binary media files. Is this what git lfs is used for or are there other popular tools in the scene?
I'm part of a compiler research group and we use GIT LFS to maintain history of the binary releases of our compiler (sac2c). Our perspective on this is that we can more closely relate the binaries to release notes, bug reports, etc. by leveraging systems like Gitlab. To give you an idea, here's our repo: https://gitlab.science.ru.nl/sac-group/sac-packages/.
Can you suggest where those artifacts should be stored?
I often have projects that require third party libraries, and perhaps I change or upgrade the version of the library I use during the history of my project. I would like to have everything versioned together so if I check out a certain commit, I know that everything will work.
The downside of this is that my git repo will get bloated over time if I keep changing large dlls. I've sometimes made submodules to mitigate this problem. Git lfs also seemed like a good solution, but your comment makes me feel like I'm doing something bad here...
You could use a private feed to a package manager that supports arbitrary ZIP packages. You can (ab)use most package managers this way (npm, NuGet, Maven, etc). Azure DevOps Artifacts also has a very simple, dumb "Universal Package" manager that just attaches a version number to a ZIP file and really doesn't care what is inside it.
Then you just use the usual sorts of package management tools (including locks) to keep everything versioned together.
> Can you suggest where those artifacts should be stored?
It depends on where you get them. If you're using a compiled object that you get from a 3rd party and you don't have the source or don't build them then I'm not sure what to tell you. You're at the mercy of your vendor anyway so my other arguments against it won't make a difference.
IMO it might be a good idea to store binary assets in Git from a "history management" point of view.
However it's a mess when you have to deal with a lot of huge files. Not especially because of bad-habit but because it's hard to manage on day-to-day and you will face a lot of issues.
One of the issues I got recently is the time required to switch between branches (if LFS-tracked files are changing). It is due to the "checkout" and how git-lfs works (see the Github issue). Actually it copies the file... This operation takes time... Instead it could use hard-link or symbolic-link.
> IMO it might be a good idea to store binary assets in Git from a "history management" point of view.
You can't diff them, just see that they changed. If you have versions in the filename of the object that helps but isn't great from an auditing perspective. People change filenames, and if someone alters or tampers with the file it's hard to tell. You might argue that this is also possible if you fetch the file somewhere or install a package and link against it, but you have a better audit trail, packages can (should) also be signed.
Pulling resources in from other places still has to solve the same problems as git-lfs. What specifically makes resources stored elsewhere less of a bad-habit enabler?
Either you need (large, not suitable for inclusion or submodules) resources from outside the repo or you don't. If you don't, don't do it. If you do, git-lfs is to me a valid choice and the specific best choice depends on workflow details, and to me it doesn't have any obvious potential to be misused more.
However, one of the benefits to Git is that it is enormously flexible, and Git can be and is successfully leveraged to provide large file functionality using this mechanism. Before I maintained Git LFS or knew how it worked, I hypothesized that this would be the ideal mechanism to do handling of large files, so it shouldn't be surprising that the original developers decided to do so. We just need to be cognizant that as with any design, there are going to be some limitations, which is what I was mentioning in that thread.
As a side note, it's intentional that we don't use hard links or symlinks into the LFS storage because that makes it extremely easy to destroy or corrupt data by modifying the working tree, so Git's behavior here is actually helpful. There is copy-on-write functionality in Git LFS that can be used if your file system supports it to make the size of the repository a little less painful with large files.