On Wed, Oct 09, 2024 at 04:31:28PM -0400, Taylor Blau wrote: > In order to do this, the pack-reuse code within pack-bitmap.c marks > bits in a separate bitmap called 'reuse_as_ref_delta'. Objects whose > bits are marked in that bitmap must be converted from OFS_DELTAs to > REF_DELTAs. > > To mark bits in that bitmap, we adjust find_base_bitmap_pos() to > return the bitmap position of any delta object's base regardless of > whether or not it came from the same pack. This is done by: > > 1. First converting the base object's into a pack position (via > `offset_to_pack_pos()`). > > 2. Then converting from pack position into into lexical order (via > `pack_pos_to_index()`). > > 3. Then into an object ID (via `nth_packed_object_id()`). > > 4. Then into a position in the MIDX's lexical order of object IDs > (via `bsearch_midx()`). > > 5. And finally into a position in the MIDX's pseudo-pack order (via > `midx_pair_to_pack_pos()`). > > If we can find that base object in the MIDX, then we use its position > in the MIDX's pseudo-pack order to determine whether or not it was > selected from the same pack. If it is, no adjustment is necessary. > Otherwise, we mark the delta object's position in the new > `reuse_as_ref_delta` bitmap, and convert accordingly from within > `write_reused_pack_one()`. OK, that makes sense. It does feel like a non-trivial amount of work to do for each delta we're going to (potentially) reuse from a midx'd pack. Can we recognize the common case that the base is in the same pack and also being sent/reused without doing the full conversion to oid and the resulting bsearch? > @@ -1182,10 +1188,24 @@ static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile, > if (pos >= end) > return reuse_packfile->bitmap_pos / BITS_IN_EWORD; > > - while (pos < end && > - reuse_packfile_bitmap->words[pos / BITS_IN_EWORD] == (eword_t)~0) > + while (pos < end) { > + size_t wpos = pos / BITS_IN_EWORD; > + eword_t reuse; > + > + reuse = reuse_packfile_bitmap->words[wpos]; > + if (reuse_as_ref_delta_packfile_bitmap) { > + /* > + * Can't reuse verbatim any objects which need > + * to be first rewritten as REF_DELTAs. > + */ > + reuse &= ~reuse_as_ref_delta_packfile_bitmap->words[wpos]; > + } > + > + if (reuse != (eword_t)~0) > + break; > + > pos += BITS_IN_EWORD; > - > + } This is accessing reuse_as_ref_delta_packfile_bitmap->words directly using pos/end as limits. But those come from reuse_packfile_bitmap. Are we guaranteed to have zero-extended the reuse_as_ref_delta bitmap as far as the original went? Could we just be calling bitmap_get() here, which would do the length check for us? Though I guess we would miss out on some whole-word magic it is doing. So maybe we need to just do that length check ourselves. -Peff