Re: [PATCH 4/4] gc docs: downplay the usefulness of --aggressive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 18, 2019 at 11:13:57PM +0100, Ævar Arnfjörð Bjarmason wrote:

> What happened here is that I'd entirely forgotten about your 07e7dbf0db
> and in skimming while writing this throught we were still picking larger
> depth values, which we aren't.
> 
> I'll fix that, and see that gc.aggressiveDepth also needs to be changed
> to note that the depth it's now using as "aggressive" is just the
> default of 50 you'd get without --aggressive.

Yeah. I think I tweaked the documentation in that commit, but I agree
it's probably worth calling out the subtlety that it's the same as the
default.

> > It is possible, if it finds more deltas due to the larger window, that
> > we'd spend more time accessing those deltas. But if the chains aren't
> > long, the base cache tends to perform well, and delta reconstruction is
> > about the same cost as zlib inflating. And we have a smaller disk cache
> > footprint.
> 
> I haven't tested that but suspect it won't matter. We do spend a *lot*
> more time though, so that still needs to be noted...

Yeah, agreed on both counts.

> On the topic of other things I may have screwed up, is this:
> 
>     +The effects of this option are persistent to the extent that
>     +`gc.autoPackLimit` and friends don't cause a consolidation of existing
>     +pack(s) generated with this option.
> 
> Actually wrong since we don't pass -f usually, and thus a one-off
> --aggressive would live forever for the objects involved in that run no
> matter if we later consolidate?
> 
> From the docs it seems so, but I'd like to confirm...

In general, yeah, I'd expect an --aggressive repack's effects to live on
through subsequent gc's. It's not _entirely_ true, because objects from
that big repack may end up duplicate in another pack (e.g., due to
thin-fixing, or just a client which sends objects we didn't need due to
push's abbreviated negotiation). And then either:

  - we may select the copy of the object from the other pack, where it's
    a base object, and then end up looking for a new delta for it

  - after the pack-mru patches from ~2016, we don't have a strict
    ordering of the packs, which means we can see cycles in the delta
    graph. So even if object A isn't duplicated, it may be a delta on B,
    which deltas on C, and then the copy of C we pick is from another
    pack where it's a delta on A. We have to break the cycle, which
    could happen on any one of A, B, or C.

I haven't done careful measurements, but I'd be surprised if those cases
make a significant dent, even over many gc's. What I think probably does
make a dent is that new objects come into the repo with whatever crappy
packing the client did as part of the push, and you'd ideally like to
throw away all of their deltas and just find new good ones.

I think it might help for pack-objects to have a mode that isn't quite
"keep the big pack", but rather "keep the deltas from the big pack, but
not other ones, but otherwise create a new big pack". But this has
diverged pretty far from the point of your series. :)

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux