On 2020-06-24 at 00:38:39, shejan shuza wrote: > Hi, I have a repository with about 55GB of contents, with binary files > that are less than 100MB each (so no LFS mode) from a project which > has almost filled up an entire hard drive. I am trying to add all of > the contents to a git repo and push it to GitHub but every time I do > > git add . > > in the folder with my contents after initializing and setting my > remote, git starts caching all the files to .git/objects, making the > .git folder grow in size rapidly. All the files are binaries, so git > cannot stage changes between versions anyway, so there is no reason to > cache versions. What you're experiencing is normal; storing files in the .git directory is how Git keeps track of them. It can't rely on the copies in your working tree because you can modify those files at any time, and if you did so, relying on them would corrupt the repository. Also, note that Git can and does deltify changes between revisions once the data is packed, regardless of whether the file is binary, but how well it does so depends on your data. For example, if it's compressed, it likely doesn't deltify very well, so storing things like compressed images or zip files using deflate is generally going to result in a bloated repository. However, if you don't need versioning for these files, then you don't need a Git repository. Git is a tool for versioning files, not a general storage mechanism. You may find a cloud storage bucket or some other artifact storage service may meet your needs better. I will also tell you from a practical point of view that almost nobody (including you) will want to host a 55 GB repository filled with binary blobs. Usually repacking these repositories is very expensive, requiring extensive CPU and memory usage for a prolonged time for little useful benefit. > Is there any way, such as editing the git attributes or changing > something about how files are staged in the git repository, to only > just add indexes or references to files in the repository rather than > cache them into the .git folder, while also being able to push all the > data to GitHub? This is how Git LFS and similar tools, like git-annex, work. Git LFS will create copies of the objects in your .git directory though, at least until they're pushed to the server, at which point they can be pruned. Git LFS has the same limitation as Git here. I'm less familiar with git-annex, but it is also a popular choice. However, as mentioned, it sounds like you don't need versioning at all, so unless you do, Git with Git LFS will be no more suitable for this than plain Git. If that's the case, I encourage you to explore alternate solutions. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204
Attachment:
signature.asc
Description: PGP signature