Re: [PATCH v2 00/17] pack-objects: add --path-walk option for better deltas

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 28, 2024 at 01:13:15PM -0400, Derrick Stolee wrote:
> On 10/28/24 12:47 PM, Taylor Blau wrote:
> > On Mon, Oct 28, 2024 at 06:46:07AM +0100, Patrick Steinhardt wrote:
> > > I've flagged this internally now at GitLab so that we can provide some
> > > more data with some of the repos that are on the bigger side to check
> > > whether we can confirm the findings and to prioritize its review.
> >
> > I suspect that you'll end up measuring no change assuming that you
> > (AFAIK) use bitmaps and (I imagine) delta islands in your production
> > configuration? This series is not compatible with either of those
> > features to my knowledge.
> You are correct that this is not compatible with those features as-is.
> _Maybe_ there is potential to integrate them in the future, but that
> would require better understanding of whether the new compression
> mechanism valuable in enough cases (final storage size or maybe even
> in repacking time).

I think the bitmap thing is not too big of a hurdle. The .bitmap file is
the only spot we store name-hash values on-disk in the "hashcache"
extension.

Unfortunately, there is no easy way to reuse the format of the existing
hashcache extension as-is to indicate to the reader whether they are
recording traditional name-hash values, or the new --path-walk hash
values.

I suspect that you could either add a new extension for --path-walk hash
values, or add a new variant of the hashcache extension that has a flag
to indicate what kind of hash value it's recording.

Of the two, I think the latter is preferred, since it would allow us to
grow new hash functions on paths in the future without needing to add an
additional extension (only a new bit in the existing one).

> At the very least, it would be helpful if some other large repos were
> tested to see how commonly this could help client-side users. Are
> there other aspects to a repo's structure that could be important to
> how effective this approach is?

What measurements are you looking for here? I thought that you had
already done an extensive job of measuring the client-side impact of
pushing smaller packs and faster local repacks, no?

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux