Re: [PATCH v4 0/8] repack: support repacking into a geometric sequence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday, February 23, 2021 1:44:05 PM MST Jeff King wrote:
> On Mon, Feb 22, 2021 at 11:43:22PM -0800, Junio C Hamano wrote:
> > >     ++	/*
> > >     ++	 * order packs by descending mtime so that objects are laid out
> > >     ++	 * roughly as newest-to-oldest
> > >     ++	 */
> > >     
> > >      +	if (a->mtime < b->mtime)
> > >      +		return 1;
> > >     
> > >     ++	else if (b->mtime < a->mtime)
> > >     ++		return -1;
> > >     
> > >      +	else
> > >      +		return 0;
> > 
> > I think this strategy makes sense when this repack using this new
> > feature is run for the first time in a repository that acquired many
> > packs over time.  I am not sure what happens after the feature is
> > used a few times---it won't always be the newest sets of packs that
> > will be rewritten, but sometimes older ones are also coalesced, and
> > when that happens the resulting pack that consists primarily of older
> > objects would end up having a more recent timestamp, no?
> 
> Yeah, this is definitely a heuristic that can get out of sync with
> reality. I think in general if you have base pack A and somebody pushes
> up B, C, and D in sequence, we're likely to roll up a single DBC (in
> that order) pack. Further pushes E, F, G would have newer mtimes. So we
> might get GFEDBC directly. Or we might get GFE and DBC, but the former
> would still have a newer mtime, so we'd create GFEDBC on the next run.
> 
> The issues come from:
> 
>   - we are deciding what to roll up based on size. A big push might not
>     get rolled up immediately, putting it out-of-sync with the rest of
>     the rollups.

Would it make sense to somehow detect all new packs since the last rollup and 
always include them in the rollup no matter what their size? That is one thing 
that my git-exproll script did. One of the main reasons to do this was because 
newer packs tended to look big (I was using bytes to determine size), and 
newer packs were often bigger on disk compared to other packs with similar 
objects in them (I think you suggested this was due to the thickening of packs 
on receipt). Maybe roll up all packs with a timestamp "new enough", no matter 
how big they are?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux