On Mon, Oct 28, 2024 at 01:13:15PM -0400, Derrick Stolee wrote: > On 10/28/24 12:47 PM, Taylor Blau wrote: > > On Mon, Oct 28, 2024 at 06:46:07AM +0100, Patrick Steinhardt wrote: > > > I've flagged this internally now at GitLab so that we can provide some > > > more data with some of the repos that are on the bigger side to check > > > whether we can confirm the findings and to prioritize its review. > > > > I suspect that you'll end up measuring no change assuming that you > > (AFAIK) use bitmaps and (I imagine) delta islands in your production > > configuration? This series is not compatible with either of those > > features to my knowledge. > You are correct that this is not compatible with those features as-is. > _Maybe_ there is potential to integrate them in the future, but that > would require better understanding of whether the new compression > mechanism valuable in enough cases (final storage size or maybe even > in repacking time). I think the bitmap thing is not too big of a hurdle. The .bitmap file is the only spot we store name-hash values on-disk in the "hashcache" extension. Unfortunately, there is no easy way to reuse the format of the existing hashcache extension as-is to indicate to the reader whether they are recording traditional name-hash values, or the new --path-walk hash values. I suspect that you could either add a new extension for --path-walk hash values, or add a new variant of the hashcache extension that has a flag to indicate what kind of hash value it's recording. Of the two, I think the latter is preferred, since it would allow us to grow new hash functions on paths in the future without needing to add an additional extension (only a new bit in the existing one). > At the very least, it would be helpful if some other large repos were > tested to see how commonly this could help client-side users. Are > there other aspects to a repo's structure that could be important to > how effective this approach is? What measurements are you looking for here? I thought that you had already done an extensive job of measuring the client-side impact of pushing smaller packs and faster local repacks, no? Thanks, Taylor