Re: worktrees vs. alternates

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 16 May 2018 12:33:20 +0200

On Wed, May 16 2018, Lars Schneider wrote:

>> On 16 May 2018, at 11:29, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>>
>>
>> On Wed, May 16 2018, Lars Schneider wrote:
>>
>>> I am looking into different options to cache Git repositories on build
>>> machines. The two most promising ways seem to be git-worktree [1] and
>>> git-alternates [2].
>>>
>>> I wonder if you see an advantage of one over the other?
>>>
>>> My impression is that git-worktree supersedes git-alternates. Would
>>> that be a fair statement? If yes, would it makes sense to deprecate
>>> alternates for simplification?
>>>
>>> [1] https://git-scm.com/docs/git-worktree
>>> [2] https://git-scm.com/docs/gitrepository-layout#gitrepository-layout-objectsinfoalternates
>>
>> It's not correct that worktrees supersede alternates, or the other way
>> around, they're orthagonal features.
>>
>> git-worktree allows you to create a new working directory connected to
>> the same local object store.
>>
>> Alternates allow you to declare in any given local object store, that
>> your set of objects isn't complete, and you can find the rest at some
>> other location, those object stores may or may not have more than one
>> worktree connected to them.
>
> OK. I just wonder in what situation I would work with an incomplete
> object store. The only use case I could imagine is that two repos share
> a common set of objects (most likely blobs). However, in that situation
> I would keep the two independent lines of development in a single repo
> with two root commits.
>
> Would it be fair to say that "git alternates" are a good mechanism to
> cache objects across different repos? However, I would consider a cache
> hit  between different repos unlikely. In that line of thinking
> "git worktree" would be a good (maybe better?) mechanism to cache objects
> for a single repo?

The use case is cloning with e.g. --shared or --reference.

Consider the following scenario:

 * You have 100 developers with *nix accounts on a single machine.

 * These 100 all need access to the same repo, but .git/objects is 1G

 * This would then naïvely require 100G of space + working tree. If the
   machine has 92G of RAM you'll be swapping the fscache in & out and
   performance will be horrible.

Instead, you have a single repository maintained on the system designed
to have all the alternates point to it, cloned as:

    git clone --reference /usr/share/git_tree/bigrepo ssh://....bigrepo.git ~/bigrepo

Now you're using just a bit over 1GB of space in total, but any new
objects the devs create will be written to their local .git dir, since
you're spending 1GB for those 100 repos instead of 100GB the data is
always in the FS cache.

And here's where this isn't at all like "worktree", each of those 100
will have their own "master" branch, and they can all create 100
different branches called "topic" that can be different.

With worktree the references are all shared across the same worktrees,
so it's designed for one dev working on different topic branches in
different checkouts.

The --reference feature is also commonly used in CI-like
environments. Imagine the above example, but except with 100 devs you
have CI jobs on the same machine being spun up all the time, although
here you get some overlap, if you're OK with the main branch name being
different you can also do this with worktrees instead of alternates.