Re: [PATCH v4 0/8] repack: support repacking into a geometric sequence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday, February 23, 2021 3:15:12 PM MST Jeff King wrote:
> On Tue, Feb 23, 2021 at 12:54:56PM -0700, Martin Fick wrote:
> > > Yeah, this is definitely a heuristic that can get out of sync with
> > > reality. I think in general if you have base pack A and somebody pushes
> > > up B, C, and D in sequence, we're likely to roll up a single DBC (in
> > > that order) pack. Further pushes E, F, G would have newer mtimes. So we
> > > might get GFEDBC directly. Or we might get GFE and DBC, but the former
> > > would still have a newer mtime, so we'd create GFEDBC on the next run.
> > > 
> > > The issues come from:
> > >   - we are deciding what to roll up based on size. A big push might not
> > >   
> > >     get rolled up immediately, putting it out-of-sync with the rest of
> > >     the rollups.
> > 
> > Would it make sense to somehow detect all new packs since the last rollup
> > and always include them in the rollup no matter what their size? That is
> > one thing that my git-exproll script did. One of the main reasons to do
> > this was because newer packs tended to look big (I was using bytes to
> > determine size), and newer packs were often bigger on disk compared to
> > other packs with similar objects in them (I think you suggested this was
> > due to the thickening of packs on receipt). Maybe roll up all packs with
> > a timestamp "new enough", no matter how big they are?
> 
> That works against the "geometric" part of the strategy, which is trying
> to roll up in a sequence that is amortized-linear. I.e., we are not
> always rolling up everything outside of the base pack, but trying to
> roll up little into medium, and then eventually medium into large. If
> you roll up things that are "too big", then you end up rewriting the
> bytes more often, and your amount of work becomes super-linear.

I'm not sure I follow, it would seem to me that it would stay linear, and be 
at most rewriting each new packfile once more than previously? Are you 
envisioning more work than that?

> Now whether that matters all that much or not is perhaps another
> discussion. The current strategy is mostly to repack all-into-one with
> no base, which is the worst possible case. So just about any rollup
> strategy will be an improvement. ;)

+1 Yes, while anything would be an improvement, this series' approach is very 
good! Thanks for doing this!!

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux