Re: Git for games working group

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Sun, 16 Sep 2018 16:55:13 +0200

On Sat, Sep 15 2018, Taylor Blau wrote:

> On Fri, Sep 14, 2018 at 02:09:12PM -0700, John Austin wrote:
>> I've been working myself on strategies for handling binary conflicts,
>> and particularly how to do it in a git-friendly way (ie. avoiding as
>> much centralization as possible and playing into the commit/branching
>> model of git).
>
> Git LFS handles conflict resolution and merging over binary files with
> two primary mechanisms: (1) file locking, and (2) use of a merge-tool.
>
>   1. is the most "non-Git-friendly" solution, since it requires the use
>      of a centralized Git LFS server (to be run alongside your remote
>      repository) and that every clone phones home to make sure that they
>      are OK to acquire a lock.
>
>      The workflow that we expect is that users will run 'git lfs lock
>      /path/to/file' any time they want to make a change to an
>      unmeregeable file, and that this call first checks to make sure
>      that they are the only person who would hold the lock.
>
>      We also periodically "sync" the state of locks locally with those
>      on the remote, namely during the post-merge, post-commit, and
>      post-checkout hook(s).
>
>      Users are expected to perform the 'git lfs unlock /path/to/file'
>      anytime they "merge" their changes back into master, but the
>      thought is that servers could be taught to automatically do this
>      upon the remote detecting the merge.
>
>   2. is a more it-friendly approach, i.e., that the 'git mergetool'
>      builtin does work with files tracked under Git LFS, i.e., that both
>      sides of the merge are filtered so that the mergetool can resolve
>      the changes in the large files instead of the textual pointers.
>
>
>> I've got to a loose design that I like, but it'd be good to get some
>> feedback, as well as hearing what other game devs would want in a
>> binary conflict system.
>
> Please do share, and I would be happy to provide feedback (and make
> proposals to integrate favorable parts of your ideas into Git LFS).

All of this is obviously correct as far as git-lfs goes. Just to use
this as a jump-off comment on the topic of file locking and to frame
this discussion more generally.

It's true that a tool like git-lfs "requires the use of a centralized
[...] server" for file locking, but it's not the case that a feature
like file locking requires a centralized authority.

In particular, git-lfs unlike git-annex (which preceded it) does the
opposite of (to quote John upthread) "avoid[...] as much centralization
as possible", it *is* explicitly a centralized large file solution, not
a distributed one, as opposed to git-annex.

That's not a critique of git-lfs or the centralized method, or a
recommendation for decentralization in this context, but we already have
a similar distributed solution in the form of git-annex, it's just a hop
skip and a jump away from changing "who has the file" to "who has the
lock".

So how does that work? In the centralized case like
git-lfs/cvs/p4/whatever you have some "lock/unlock" command, and it
locks a file on a central server, locking is usually a a [locked?, who]
state of "is it locked" and "who locked it?". Usually this is also
followed-up on the client-side by checking those files out without the
"w" flag.

In the hypothetical git-annex-like case (simplifying a bit for the
purposes this explanation), for every FILE in your tree you have a
corresponding FILE.lock file, but it's not a boolean, but a log of who's
asked for locks, i.e. lines of:

    <repository UUID> <ts> <state> <who (email?)> <explanation?>

E.g.:

    $ cat Makefile.lock
    my-random-per-repo-id 2018-09-15 1 avarab@xxxxxxxxx "refactoring all Makefiles"
    my-random-per-repo-id 2018-09-16 0 avarab@xxxxxxxxx "done!"

This log is append-only, when clients encounter conflicts there's a
merge driver to ensure that all updates are kept.

You can then enact a policy saying you care or don't care about updates
from certain sources, or ignore locks older than so-and-so.

None of this is stuff I'd really recommend. It's just instructive to
point out that if someone wants a distributed locking solution for git,
it pretty much already exists, you can even (ab)use git-annex for it
today with a tiny hack on top.

I.e. each time you want to lock a file called Makefile just:

    echo We created a lock for this >Makefile.lock &&
    git annex add Makefile.lock &&
    git annex sync

And to release the lock:

    git annex rm Makefile.lock &&
    git annex sync

Then you and others using this just mentally pretend (or setup aliases)
that the following mapping exists:

    git annex get <file> && git annex sync ==> git lockit <file>
    git annex rm <file>  && git annex sync ==> git unlockit <file>

And that stuff like "git annex whereis" (designed to list "who has the
files") means "git annex who-has-locks".

Then you'd change the post-{checkout,merge} hooks to list the locks
"tracked annex files", chmod -w appropriately, and voila, a distributed
locking solution for git built on top of an existing tool you can
implement in a couple of hours.

Now, if I were in a game studio like this would I do any of this? Nope,
I think even if you go for locks something like the centralized git-lfs
approach is simpler and probably more appropriate (you presumably want
to be centralized anyway).

But to be honest I don't really get the need for this given something
like the use-case noted upthread:

    > John Austin <john@xxxxxxxxxxxxxxxxxxxx> wrote:
    > An essential example would be a team of 5 audio designers working
    > together on the SFX for a game. If one designer wants to add a layer
    > of ambience to 40% of the .wav files, they have to coordinate with
    > everyone else on the project manually.

If you have 5 people working on a project together, isn't it more
straightforward to post in IRC/E-Mail:

    Hey @all, don't change *.wav files for the next couple of days,
    major refactoring.

That's what we do all the time over in the non-game-non-binary-assets SW
development world, and I daresay that even if you have textual
conflicts, they're sometimes just as hard to solve.

I.e. you can have two people unaware of each other on a team starting to
in parallel refactor the same set of code in two completely different
ways, needing a lot of manual merging / throwing out of most of one
implementation. The way that's usually dealt with is something like the
above example post to a ML.

But maybe I'm just not imagining the use-cases.