Re: Question: How to execute git-gc correctly on the git server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 08, 2022 at 01:35:04PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> The "cruft pack" facility does many different things, and my
> >> understanding of it is that GitHub's not using it only as an end-run
> >> around potential corruption issues, but that some not yet in tree
> >> patches on top of it allow more aggressive "gc" without the fear of
> >> corruption.
> >
> > I don't think cruft packs themselves help against corruption that much.
> > For many years, GitHub used "repack -k" to just never expire objects.
> > What cruft packs help with is:
> >
> >   1. They keep cruft objects out of the main pack, which reduces the
> >      costs of lookups and bitmaps for the main pack.

Peff isn't wrong here, but there is a big caveat which is that this is
only true when using a single pack bitmap. Single pack bitmaps are
guaranteed to have reachability closure over their objects, but writing
a MIDX bitmap after generating the MIDX does not afford us the same
guarantees.

So if you have a cruft pack which contains some unreachable object X,
which is made reachable by some other object that *is* reachable from
some reference, *and that* object is included in one of the MIDX's
packs, then we won't have reachability closure unless we also bitmap the
cruft pack, too.

So even though it helps a lot with bitmapping in the single-pack case,
in practice it doesn't make a significant difference with multi-pack
bitmaps.

> >   2. When you _do_ choose to expire, you can do so without worrying
> >      about accidentally exploding all of those old objects into loose
> >      ones (which is not wrong from a correctness point of view, but can
> >      have some amazingly bad performance characteristics).
> >
> > I think the bits you're thinking of on top are in v2.39. The "repack
> > --expire-to" option lets you write objects that _would_ be deleted into
> > a cruft pack, which can serve as a backup (but managing that is out of
> > scope for repack itself, so you have to roll your own strategy there).
>
> Yes, that's what I was referring to.

Yes, we use the `--expire-to` option when doing a pruning GC to move the
expired objects out of the repo to some "../backup.git" location. The
out-of-tree tools that Ævar is speculating is basically running
`cat-file --batch` in the backup repo, feeding it the list of missing
objects, and then writing those objects (back) into the GC'd repository.

> I think I had feedback on that series saying that if held correctly this
> would also nicely solve that long-time race. Maybe I'm just
> misremembering, but I (mis?)recalled that Taylor indicated that it was
> being used like that at GitHub.

It (the above) doesn't solve the race, but it does make it easier to
recover from a corrupt repository when we lose that race.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux