Re: [PATCH v2 3/3] refs/files: use heuristic to decide whether to repack with `--auto`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 04, 2024 at 08:24:44AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> 
> > Introduce a heuristic that decides whether or not it is worth to pack
> > loose references. The important factors to decide here are the number of
> > loose references in comparison to the overall size of the "packed-refs"
> > file. The bigger the "packed-refs" file, the longer it takes to rewrite
> > it and thus we scale up the limit of allowed loose references before we
> > repack.
> >
> > As is the nature of heuristics, this mechansim isn't obviously
> > "correct", but should rather be seen as a tradeoff between how much
> > resources we spend packing refs and how inefficient the ref store
> > becomes. For all I can say, we have successfully been using the exact
> > same heuristic in Gitaly for several years by now.
> 
> This seems to hit the balance between the thoroughness of repack
> (leaving fewer loose refs is good) and the frequencey (doing repack
> less often is good) in a quite sensible way.
> 
> I also have to wonder if it adds a good component to the heuristics
> to leave younger loose refs (wrt mtime) out of packed-refs, with the
> expectation that they are more likely to be updated again soon than
> refs that were written long time ago and stayed the same value.

Maybe.

In general, I expect that most users typically only touch a very small
set of refs, e.g. the four or five feature branches that they have. So
even without such an additional component we would not end up repacking
all that often.

That picture changes once you consider remotes, because with bigger
teams it's quite likely that you'll get many ref updates there. I'm not
sure a time-based heuristic would be a good fit for this usecase,
because I'd think that repacking those right away is sensible most of
the time.

We also have to consider that an mtime-based component makes the overall
system harder to understand and indeterministic. Which isn't to say that
it doesn't make sense.  But I rather think we should land the simple and
stupid solution first, and in case we see that it's insufficient we can
iterate and improve it in the future.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux