Re: worktrees vs. alternates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 16 2018, Konstantin Ryabitsev wrote:

> On 05/16/18 09:02, Derrick Stolee wrote:
>> This is the biggest difference. You cannot have the same ref checked out
>> in multiple worktrees, as they both may edit that ref. The alternates
>> allow you to share data in a "read only" fashion. If you have one repo
>> that is the "base" repo that manages that objects dir, then that is
>> probably a good way to reduce the duplication. I'm not familiar with
>> what happens when a "child" repo does 'git gc' or 'git repack', will it
>> delete the local objects that is sees exist in the alternate?
>
> The parent repo is not keeping track of any other repositories that may
> be using it for alternates, which is why you basically:
>
> 1. never run auto-gc in the parent repo
> 2. repack it manually using -Ad to keep loose objects that other repos
> may be borrowing (but we don't know if they are)
> 3. never prune the parent repo, because this may delete objects other
> repos are borrowing
>
> Very infrequently you may consider this extra set of maintenance steps:
>
> 1. Find every repo mentioning the parent repository in their alternates
> 2. Repack them without the -l switch (which copies all the borrowed
> objects into those repos)
> 3. Once all child repos have been repacked this way, prune the parent
> repo (it's safe now)
> 4. Repack child repos again, this time with the -l flag, to get your
> savings back.
>
> I would heartily love a way to teach git-repack to recognize when an
> object it's borrowing from the parent repo is in danger of being pruned.
> The cheapest way of doing this would probably be to hardlink loose
> objects into its own objects directory and only consider "safe" objects
> those that are part of the parent repository's pack. This should make
> alternates a lot safer, just in case git-prune happens to run by accident.

I may have missed some edge case, but I believe this entire workaround
isn't needed if you guarantee that the parent repo doesn't contain any
objects that will get un-referenced.

You'd do that in the common case by cloning with --single-branch, and
depending on your setup --no-tags (if you delete tags). This is assuming
that your HEAD branch points to something like a "master" that doesn't
get rewound.

The problem you're describing happens if say you clone git.git and have
the "pu" branch in there in the parent, and as a result you get child
repos referencing those objects, but when the parent GCs after "pu" is
rewound the child repos break. Thus your elaborate work-around.

But that situation isn't possible in the first place if you only ever
import the "master" branch, or other references guaranteed not to
change.

Of course that has the trade-off that every child repo needs to get its
own objects for the "next" branch, "pu", etc. But those are
comparatively tiny.

I wasn't aware of -l (--local), or had forgotten about it. I thought
that we didn't have that and the "child" repos would just keep growing
over time, i.e. not get rid of the objects we're fetching into the
parent (which the parent might get later due to the child, say if it's
fetched in a daily cronjob). Good to know that's not the case.

With that --local flag the trade-off of not fetching "next" and "pu"
etc. should become irrelevant over time, as they migrate to "master"
they'll get de-duplicated, or alternatively GC'd by the child repos if
they don't make it.

>> GVFS uses alternates in this same way: we create a drive-wide "shared
>> object cache" that GVFS manages. We put our prefetch packs filled with
>> commits and trees in there, and any loose objects that are downloaded
>> via the object virtualization are placed as loose objects in the
>> alternate. We also store the multi-pack-index and commit-graph in that
>> alternate. This means that the only objects in each src dir are those
>> created by the developer doing their normal work.
>
> I'm very interested in GVFS, because it would certainly make my life
> easier maintaining source.codeaurora.org, which is many thousands of
> repos that are mostly forks of the same stuff. However, GVFS appears to
> only exist for Windows (hint-hint, nudge-nudge). :)

This should make you happy:

https://arstechnica.com/gadgets/2017/11/microsoft-and-github-team-up-to-take-git-virtual-file-system-to-macos-linux/

But I don't know what the current status is or where it can be followed.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux