On Tue, May 21, 2024 at 03:01:38PM -0400, Taylor Blau wrote: > Here is another reroll my topic to introduce pseudo-merge bitmaps. OK, I got through the whole thing. I left a few small comments, but mostly just observations. Overall, the shape of it looks pretty good. The much bigger question to me is: does it work? The perf results you showed at the end are underwhelming, but I think that's mostly because it's not an interesting repository. I think it would be nice to see at least some point-in-time benchmarks on a single repository. But much more interesting to me is how it performs in the real world in aggregate, over time: - how often / how much do pseudo-merges speed up queries in the real world. Clones/fetches, but also reachability queries. Could connectivity checks use bitmaps with this? - how often do pseudo-merge groups get invalidated by refs changing (and thus we lose the speedups from above)? - what's the cost like to generate them initially? - what's the cost like for subsequent repacks? Does the selection / grouping algorithm do a good job of keeping the older, larger groups stable (so that we can reuse them verbatim)? I know you don't have those answers yet, and I know there's some chicken-and-egg with getting this integrated so that you can start to explore that. So I mostly reviewed this with an eye towards: - does the idea make sense (I think it does, but I'm kind of biased) - are the patches going to hurt anybody who isn't using the new feature (I think the answer is no) - does the on-disk representation seem right, since that is hard to change later (I didn't see any issues) - does the implementation look clean and plausibly correct (yes, but what I'm getting at is that I didn't pore over all of the new code with a microscope. Mostly I think the proof is in the pudding that it provides the same correct answers more quickly). So to my eyes it looks good to move forward and let people start playing with it. The big "experimental" warning in the config is good. Maybe we'd want want in gitpacking(7), too? I did wonder briefly what the backup plan is if there are problems with the on-disk format (or worst case, if it turns out to be a dead end). We've allocated a flag bit which I think we'd need to respect forever (though as an optional extension, it's OK to understand and ignore it). If we needed a "pseudo-merge bitmaps v2", how would that work? I think we'd have to allocate another bit in the flags field. I wonder if the start of the pseudo-merge section should have a 4-byte version/flags field itself? I don't think that's something we've done before, and maybe it's overkill. I dunno. It's just a lot easier to do now than later. -Peff