RE: EXT :Re: GIT and large files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The files in question would be in directory containing many files some small other huge (example: text files, docs,and jpgs are Mbs, but executables and ova images are GBs, etc).

Lou

From: Gary Fixler [mailto:gfixler@xxxxxxxxx] 
Sent: Tuesday, May 20, 2014 12:09 PM
To: Stewart, Louis (IS)
Cc: git@xxxxxxxxxxxxxxx
Subject: EXT :Re: GIT and large files

Technically yes, but from a practical standpoint, not really. Facebook recently revealed that they have a 54GB git repo[1], but I doubt it has 20+GB files in it. I've put 18GB of photos into a git repo, but everything about the process was fairly painful, and I don't plan to do it again.
Are your files non-mergeable binaries (e.g. videos)? The biggest problem here is with branching and merging. Conflict resolution with non-mergeable assets ends up an us-vs-them fight, and I don't understand all of the particulars of that. From git's standpoint it's simple - you just have to choose one or the other. From a workflow standpoint, you end up causing trouble if two people have changed an asset, and both people consider their change important. Centralized systems get around this problem with locks.
Git could do this, and I've thought about it quite a bit. I work in games - we have code, but also a lot of binaries, that I'd like to keep in sync with the code. For awhile I considered suggesting some ideas to this group, but I'm pretty sure the locking issue makes it a non-starter. The basic idea - skipping locking for the moment - would be to allow setting git attributes by file type, file size threshold, folder, etc., to allow git to know that some files are considered "bigfiles." These could be placed into the objects folder, but I'd actually prefer they go into a .git/bigfile folder. They'd still be saved as contents under their hash, but a normal git transfer wouldn't send them. They'd be in the tree as 'big' or 'bigfile' (instead of 'blob', 'tree', or 'commit' (for submodules)).

Git would warn you on push that there were bigfiles to send, and you could add, say, --with-big to also send them, or send them later with, say, `git push --big`. They'd simply be zipped up and sent over, without any packfile fanciness. When you clone, you wouldn't get the bigfiles, unless you specified --with-big, and it would warn you that there are also bigfiles, and tell you what command to run to get also get them (`git fetch --big`, perhaps). Git status would always let you know if you were missing bigfiles. I think hopping around between commits would follow the same strategy, you'd always have to, e.g. `git checkout foo --with-big`, or `git checkout foo` and then `git update big` (or whatever - I'm not married to any of these names).

Resolving conflicts on merge would simply have to be up to you. It would be documented clearly that you're entering weird territory, and that your team has to deal with bigfiles somehow, perhaps with some suggested strategies ("Pass the conch?"). I could imagine some strategies for this. Maybe bigfiles require connecting to a blessed repo to grab the right to make a commit on it. That has many problems, of course, and now I can feel everyone reading this shifting uneasily in their seats :)
-g

[1] https://twitter.com/feross/status/459259593630433280

On Tue, May 20, 2014 at 8:37 AM, Stewart, Louis (IS) <louis.stewart@xxxxxxx> wrote:
Can GIT handle versioning of large 20+ GB files in a directory?

Lou Stewart
AOCWS Software Configuration Management
757-269-2388

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

��.n��������+%������w��{.n��������n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]