On Thu, Aug 29, 2024 at 03:00:21PM -0400, Taylor Blau wrote: > > Even the midx code, which is not generating a pack, uses a "fake" > > packing_data as the way to express that (because inherently the bit > > ordering is all coming from the pack-index nature). If we likewise ever > > wrote code to generate bitmaps from an existing pack, it would probably > > use packing_data, too. :) > > I agree for the most part, though there is a lot of weight in > packing_data that would be nice to not have to carry around. I know > within GitHub's infrastructure we sometimes OOM kill invocations of "git > multi-pack-index write --bitmap" because of the memory overhead (a lot > of which is dominated by the actual traversal and bitmap generation, but > a lot that comes from just the per-object overhead). > > I've thought about alternative structures that might be a little more > memory efficient, but it's never gotten to the top of my list. True. What the index and bitmap steps really want is not an array of object_entry, but an array of pack_idx_entry (which is the first component of an object_entry). I wonder how feasible it would be to simply hold two arrays with corresponding entries at each index. Many places only care about one or the other. But for places that do care about both, especially ones that receive a pointer to an individual object_entry, they'd need to receive pointers to both. I briefly looked at the compile errors that come from making such a change. Many of them look trivial, but I think some of them get weird (the ext_bases array is also holding object_entry structs). So maybe worth pursuing in the long run, but not something to knock out this afternoon. -Peff