On Mon, May 06, 2024 at 01:52:50PM +0200, Patrick Steinhardt wrote: > On Mon, Apr 29, 2024 at 04:43:08PM -0400, Taylor Blau wrote: > > The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to > > map from commits selected for bitmaps (by OID) to a bitmapped_commit > > structure (containing the bitmap itself, among other things like its XOR > > offset, etc.) > > > > This map was initialized at the end of `bitmap_writer_build()`. New > > entries are added in `pack-bitmap-write.c::store_selected()`, which is > > called by the bitmap_builder machinery (which is responsible for > > traversing history and generating the actual bitmaps). > > > > Reorganize when this field is initialized and when entries are added to > > it so that we can quickly determine whether a commit is a candidate for > > pseudo-merge selection, or not (since it was already selected to receive > > a bitmap, and thus is ineligible for pseudo-merge inclusion). > > I feel like this last sentence here could use some more explanation as > the restriction has never been explained before. Is this a strict > requirement, or is this rather "It would be wasted anyway"? Thanks, that's a great call out. I reworded this sentence to clarify that it's redundant, but not a strict requirement. I think that's a sufficient amount of detail to motivate the change, but not so much that it distracts from the change at hand. > > The changes are as follows: > > > > - Introduce a new `bitmap_writer_init()` function which initializes > > the `writer.bitmaps` field (instead of waiting until the end of > > `bitmap_writer_build()`). > > > > - Add map entries in `push_bitmapped_commit()` (which is called via > > `bitmap_writer_select_commits()`) with OID keys and NULL values to > > track whether or not we *expect* to write a bitmap for some given > > commit. > > > > - Validate that a NULL entry is found matching the given key when we > > store a selected bitmap. > > It would be great if this refactoring went way further. Right now it's > quite hard to verify whether the writer has really been initialized in > all the right places because it is a global variable. Ideally, the whole > interface should be refactored to take the writer as input instead, > where `bitmap_writer_init()` would then initialize the local variables. > > That'd of course be a bigger refactoring and may or may not be a good > fit for this patch series. But I'd very much love to see such a refactor > as a follow-up series. Yeah, I definitely agree here ;-). I will plan on doing this as a follow-up to this series. > [snip] > > diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c > > index c35bc81d00f..9bc41a9e145 100644 > > --- a/pack-bitmap-write.c > > +++ b/pack-bitmap-write.c > > @@ -46,6 +46,11 @@ struct bitmap_writer { > > > > static struct bitmap_writer writer; > > > > +void bitmap_writer_init(struct repository *r) > > +{ > > + writer.bitmaps = kh_init_oid_map(); > > +} > > Given the other safety belts, do we also want to BUG here in case the > bitmap has already been initialized? Great suggestion, thanks. Thanks, Taylor