On Sat, Sep 15 2018, Taylor Blau wrote: > On Fri, Sep 14, 2018 at 02:09:12PM -0700, John Austin wrote: >> I've been working myself on strategies for handling binary conflicts, >> and particularly how to do it in a git-friendly way (ie. avoiding as >> much centralization as possible and playing into the commit/branching >> model of git). > > Git LFS handles conflict resolution and merging over binary files with > two primary mechanisms: (1) file locking, and (2) use of a merge-tool. > > 1. is the most "non-Git-friendly" solution, since it requires the use > of a centralized Git LFS server (to be run alongside your remote > repository) and that every clone phones home to make sure that they > are OK to acquire a lock. > > The workflow that we expect is that users will run 'git lfs lock > /path/to/file' any time they want to make a change to an > unmeregeable file, and that this call first checks to make sure > that they are the only person who would hold the lock. > > We also periodically "sync" the state of locks locally with those > on the remote, namely during the post-merge, post-commit, and > post-checkout hook(s). > > Users are expected to perform the 'git lfs unlock /path/to/file' > anytime they "merge" their changes back into master, but the > thought is that servers could be taught to automatically do this > upon the remote detecting the merge. > > 2. is a more it-friendly approach, i.e., that the 'git mergetool' > builtin does work with files tracked under Git LFS, i.e., that both > sides of the merge are filtered so that the mergetool can resolve > the changes in the large files instead of the textual pointers. > > >> I've got to a loose design that I like, but it'd be good to get some >> feedback, as well as hearing what other game devs would want in a >> binary conflict system. > > Please do share, and I would be happy to provide feedback (and make > proposals to integrate favorable parts of your ideas into Git LFS). All of this is obviously correct as far as git-lfs goes. Just to use this as a jump-off comment on the topic of file locking and to frame this discussion more generally. It's true that a tool like git-lfs "requires the use of a centralized [...] server" for file locking, but it's not the case that a feature like file locking requires a centralized authority. In particular, git-lfs unlike git-annex (which preceded it) does the opposite of (to quote John upthread) "avoid[...] as much centralization as possible", it *is* explicitly a centralized large file solution, not a distributed one, as opposed to git-annex. That's not a critique of git-lfs or the centralized method, or a recommendation for decentralization in this context, but we already have a similar distributed solution in the form of git-annex, it's just a hop skip and a jump away from changing "who has the file" to "who has the lock". So how does that work? In the centralized case like git-lfs/cvs/p4/whatever you have some "lock/unlock" command, and it locks a file on a central server, locking is usually a a [locked?, who] state of "is it locked" and "who locked it?". Usually this is also followed-up on the client-side by checking those files out without the "w" flag. In the hypothetical git-annex-like case (simplifying a bit for the purposes this explanation), for every FILE in your tree you have a corresponding FILE.lock file, but it's not a boolean, but a log of who's asked for locks, i.e. lines of: <repository UUID> <ts> <state> <who (email?)> <explanation?> E.g.: $ cat Makefile.lock my-random-per-repo-id 2018-09-15 1 avarab@xxxxxxxxx "refactoring all Makefiles" my-random-per-repo-id 2018-09-16 0 avarab@xxxxxxxxx "done!" This log is append-only, when clients encounter conflicts there's a merge driver to ensure that all updates are kept. You can then enact a policy saying you care or don't care about updates from certain sources, or ignore locks older than so-and-so. None of this is stuff I'd really recommend. It's just instructive to point out that if someone wants a distributed locking solution for git, it pretty much already exists, you can even (ab)use git-annex for it today with a tiny hack on top. I.e. each time you want to lock a file called Makefile just: echo We created a lock for this >Makefile.lock && git annex add Makefile.lock && git annex sync And to release the lock: git annex rm Makefile.lock && git annex sync Then you and others using this just mentally pretend (or setup aliases) that the following mapping exists: git annex get <file> && git annex sync ==> git lockit <file> git annex rm <file> && git annex sync ==> git unlockit <file> And that stuff like "git annex whereis" (designed to list "who has the files") means "git annex who-has-locks". Then you'd change the post-{checkout,merge} hooks to list the locks "tracked annex files", chmod -w appropriately, and voila, a distributed locking solution for git built on top of an existing tool you can implement in a couple of hours. Now, if I were in a game studio like this would I do any of this? Nope, I think even if you go for locks something like the centralized git-lfs approach is simpler and probably more appropriate (you presumably want to be centralized anyway). But to be honest I don't really get the need for this given something like the use-case noted upthread: > John Austin <john@xxxxxxxxxxxxxxxxxxxx> wrote: > An essential example would be a team of 5 audio designers working > together on the SFX for a game. If one designer wants to add a layer > of ambience to 40% of the .wav files, they have to coordinate with > everyone else on the project manually. If you have 5 people working on a project together, isn't it more straightforward to post in IRC/E-Mail: Hey @all, don't change *.wav files for the next couple of days, major refactoring. That's what we do all the time over in the non-game-non-binary-assets SW development world, and I daresay that even if you have textual conflicts, they're sometimes just as hard to solve. I.e. you can have two people unaware of each other on a team starting to in parallel refactor the same set of code in two completely different ways, needing a lot of manual merging / throwing out of most of one implementation. The way that's usually dealt with is something like the above example post to a ML. But maybe I'm just not imagining the use-cases.