On Sat, Aug 17, 2024 at 06:31:55AM -0400, Jeff King wrote: > On Thu, Aug 15, 2024 at 01:31:00PM -0400, Taylor Blau wrote: > > > In order to determine its object order, the pack-bitmap machinery keeps > > a 'struct packing_data' corresponding to the pack or pseudo-pack (when > > writing a MIDX bitmap) being written. > > > > The to_pack field is provided to the bitmap machinery by callers of > > bitmap_writer_build() and assigned to the bitmap_writer struct at that > > point. > > > > But a subsequent commit will want to have access to that data earlier on > > during commit selection. Prepare for that by adding a 'to_pack' argument > > to 'bitmap_writer_init()', and initializing the field during that > > function. > > > > Subsequent commits will clean up other functions which take > > now-redundant arguments (like nr_objects, which is equivalent to > > pdata->objects_nr, or pdata itself). > > This (and the next few follow-on commits) seem like a good change to me. > It simplifies many of the function calls, and I think it expresses the > domain logic in the API: there is a single set of objects being mapped > to bits, and many parts of the process will rely on it. Thanks. Yeah, it was a little surprising to me that it wasn't already this way, especially having worked in this area for so long. I suspect it grew this way organically over time (though haven't actually gone spelunking through the history to confirm). > Even the midx code, which is not generating a pack, uses a "fake" > packing_data as the way to express that (because inherently the bit > ordering is all coming from the pack-index nature). If we likewise ever > wrote code to generate bitmaps from an existing pack, it would probably > use packing_data, too. :) I agree for the most part, though there is a lot of weight in packing_data that would be nice to not have to carry around. I know within GitHub's infrastructure we sometimes OOM kill invocations of "git multi-pack-index write --bitmap" because of the memory overhead (a lot of which is dominated by the actual traversal and bitmap generation, but a lot that comes from just the per-object overhead). I've thought about alternative structures that might be a little more memory efficient, but it's never gotten to the top of my list. Thanks, Taylor