[PATCH 09/11] pack-bitmap: enable cross-pack delta reuse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Back when multi-pack bitmaps were first introduced, we performed
verbatim pack reuse over the MIDX's preferred pack only. Since all
duplicate objects were resolved in favor of that pack, any object from
the preferred pack that was stored as an OFS_DELTA is directly
reusable.

That's not the case when we are reusing objects from multiple packs,
like was introduced via af626ac0e0 (pack-bitmap: enable reuse from
all bitmapped packs, 2023-12-14). When reusing an object stored as a
delta from some pack other than the preferred one, it's possible that
the base was selected from a different pack.

When that's the case, we could have converted those OFS_DELTAs to
REF_DELTAs on the fly (like we do when running 'pack-objects' without
the `--delta-base-offset` option). But af626ac0e0 decided instead to
kick those objects back into the non-verbatim reuse path, meaning that
reusing them was slightly slower.

This patch removes that limitation, and teaches pack-objects to
convert deltas (whose base was selected from a different pack in the
MIDX) from OFS_DELTAs to REF_DELTAs.

In order to do this, the pack-reuse code within pack-bitmap.c marks
bits in a separate bitmap called 'reuse_as_ref_delta'. Objects whose
bits are marked in that bitmap must be converted from OFS_DELTAs to
REF_DELTAs.

To mark bits in that bitmap, we adjust find_base_bitmap_pos() to
return the bitmap position of any delta object's base regardless of
whether or not it came from the same pack. This is done by:

  1. First converting the base object's into a pack position (via
     `offset_to_pack_pos()`).

  2. Then converting from pack position into into lexical order (via
     `pack_pos_to_index()`).

  3. Then into an object ID (via `nth_packed_object_id()`).

  4. Then into a position in the MIDX's lexical order of object IDs
     (via `bsearch_midx()`).

  5. And finally into a position in the MIDX's pseudo-pack order (via
     `midx_pair_to_pack_pos()`).

If we can find that base object in the MIDX, then we use its position
in the MIDX's pseudo-pack order to determine whether or not it was
selected from the same pack. If it is, no adjustment is necessary.
Otherwise, we mark the delta object's position in the new
`reuse_as_ref_delta` bitmap, and convert accordingly from within
`write_reused_pack_one()`.

The only tweak we need to make outside of the above is to teach
`write_reused_pack_verbatim()` to only consider words of the
`reuse_packfile_bitmap` as reusable verbatim if the word does not have
any bits which are marked in the reuse_as_ref_delta bitmap, in which
case they must be converted and cannot be reused verbatim.

Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
---
 builtin/pack-objects.c      | 29 ++++++++++++++---
 pack-bitmap.c               | 63 +++++++++++++++++++++++++++----------
 pack-bitmap.h               |  1 +
 t/t5332-multi-pack-reuse.sh |  4 +--
 4 files changed, 75 insertions(+), 22 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7f50d58a235..dbcd632483e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -223,6 +223,7 @@ static size_t reuse_packfiles_nr;
 static size_t reuse_packfiles_used_nr;
 static uint32_t reuse_packfile_objects;
 static struct bitmap *reuse_packfile_bitmap;
+static struct bitmap *reuse_as_ref_delta_packfile_bitmap;
 
 static int use_bitmap_index_default = 1;
 static int use_bitmap_index = -1;
@@ -1084,7 +1085,8 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile,
 		assert(base_offset != 0);
 
 		/* Convert to REF_DELTA if we must... */
-		if (!allow_ofs_delta) {
+		if (!allow_ofs_delta ||
+		    bitmap_get(reuse_as_ref_delta_packfile_bitmap, pos)) {
 			uint32_t base_pos;
 			struct object_id base_oid;
 
@@ -1160,6 +1162,10 @@ static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile,
 			if (!bitmap_get(reuse_packfile_bitmap,
 					word_pos * BITS_IN_EWORD + offset))
 				return word_pos;
+			if (reuse_as_ref_delta_packfile_bitmap &&
+			    bitmap_get(reuse_as_ref_delta_packfile_bitmap,
+				       word_pos * BITS_IN_EWORD + offset))
+				return word_pos;
 		}
 
 		pos += BITS_IN_EWORD - (pos % BITS_IN_EWORD);
@@ -1182,10 +1188,24 @@ static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile,
 	if (pos >= end)
 		return reuse_packfile->bitmap_pos / BITS_IN_EWORD;
 
-	while (pos < end &&
-	       reuse_packfile_bitmap->words[pos / BITS_IN_EWORD] == (eword_t)~0)
+	while (pos < end) {
+		size_t wpos = pos / BITS_IN_EWORD;
+		eword_t reuse;
+
+		reuse = reuse_packfile_bitmap->words[wpos];
+		if (reuse_as_ref_delta_packfile_bitmap) {
+			/*
+			 * Can't reuse verbatim any objects which need
+			 * to be first rewritten as REF_DELTAs.
+			 */
+			reuse &= ~reuse_as_ref_delta_packfile_bitmap->words[wpos];
+		}
+
+		if (reuse != (eword_t)~0)
+			break;
+
 		pos += BITS_IN_EWORD;
-
+	}
 	if (pos > end)
 		pos = end;
 
@@ -4078,6 +4098,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
 						   &reuse_packfiles,
 						   &reuse_packfiles_nr,
 						   &reuse_packfile_bitmap,
+						   &reuse_as_ref_delta_packfile_bitmap,
 						   allow_pack_reuse == MULTI_PACK_REUSE);
 
 	if (reuse_packfiles) {
diff --git a/pack-bitmap.c b/pack-bitmap.c
index b9ea1fab397..8326a20979e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2054,18 +2054,34 @@ static int find_base_bitmap_pos(struct bitmap_index *bitmap_git,
 				uint32_t *base_bitmap_pos)
 {
 	if (bitmap_is_midx(bitmap_git)) {
+		struct object_id base_oid;
+		uint32_t base_pack_pos;
+		uint32_t base_idx_pos;
+		uint32_t base_midx_pos;
+
 		/*
-		 * Cross-pack deltas are rejected for now, but could
-		 * theoretically be supported in the future.
-		 *
-		 * We would need to ensure that we're sending both
-		 * halves of the delta/base pair, regardless of whether
-		 * or not the two cross a pack boundary. If they do,
-		 * then we must convert the delta to an REF_DELTA to
-		 * refer back to the base in the other pack.
-		 * */
-		if (midx_pair_to_pack_pos(bitmap_git->midx, pack->pack_int_id,
-					  base_offset, base_bitmap_pos) < 0)
+		 * First convert from the base object's offset
+		 * to its pack position in pack order relative
+		 * to its source pack. Use that information to
+		 * find the base object's OID.
+		 */
+		if (offset_to_pack_pos(pack->p, base_offset,
+				       &base_pack_pos) < 0)
+			return -1;
+		base_idx_pos = pack_pos_to_index(pack->p, base_pack_pos);
+		if (nth_packed_object_id(&base_oid, pack->p, base_idx_pos) < 0)
+			return -1;
+
+		/*
+		 * Now find the base object's lexical position
+		 * in the MIDX, and convert that to a bitmap
+		 * position in the MIDX's pseudo-pack order.
+		 */
+		if (!bsearch_midx(&base_oid, bitmap_git->midx,
+				  &base_midx_pos))
+			return -1;
+		if (midx_to_pack_pos(bitmap_git->midx, base_midx_pos,
+				     base_bitmap_pos) < 0)
 			return -1;
 	} else {
 		/*
@@ -2102,6 +2118,7 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 			     size_t bitmap_pos,
 			     off_t offset,
 			     struct bitmap *reuse,
+			     struct bitmap *reuse_as_ref_delta,
 			     struct pack_window **w_curs)
 {
 	off_t delta_obj_offset;
@@ -2116,6 +2133,7 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 	if (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA) {
 		off_t base_offset;
 		uint32_t base_bitmap_pos;
+		int cross_pack;
 
 		/*
 		 * Find the position of the base object so we can look it up
@@ -2129,7 +2147,8 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 					     delta_obj_offset);
 		if (!base_offset ||
 		    find_base_bitmap_pos(bitmap_git, pack, base_offset,
-					 delta_obj_offset, &base_bitmap_pos) < 0)
+					 delta_obj_offset,
+					 &base_bitmap_pos) < 0)
 			return 0;
 
 		/*
@@ -2142,8 +2161,11 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 		 */
 		if (!bitmap_get(reuse, base_bitmap_pos))
 			return 0;
-	}
 
+		cross_pack = base_bitmap_pos < pack->bitmap_pos;
+		if (cross_pack)
+			bitmap_set(reuse_as_ref_delta, bitmap_pos);
+	}
 	/*
 	 * If we got here, then the object is OK to reuse. Mark it.
 	 */
@@ -2153,7 +2175,8 @@ static int try_partial_reuse(struct bitmap_index *bitmap_git,
 
 static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git,
 						 struct bitmapped_pack *pack,
-						 struct bitmap *reuse)
+						 struct bitmap *reuse,
+						 struct bitmap *reuse_as_ref_delta)
 {
 	struct bitmap *result = bitmap_git->result;
 	struct pack_window *w_curs = NULL;
@@ -2220,7 +2243,8 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git
 			}
 
 			if (try_partial_reuse(bitmap_git, pack, bit_pos,
-					      ofs, reuse, &w_curs) < 0) {
+					      ofs, reuse, reuse_as_ref_delta,
+					      &w_curs) < 0) {
 				/*
 				 * try_partial_reuse indicated we couldn't reuse
 				 * any bits, so there is no point in trying more
@@ -2255,12 +2279,14 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 					struct bitmapped_pack **packs_out,
 					size_t *packs_nr_out,
 					struct bitmap **reuse_out,
+					struct bitmap **reuse_as_ref_delta_out,
 					int multi_pack_reuse)
 {
 	struct repository *r = the_repository;
 	struct bitmapped_pack *packs = NULL;
 	struct bitmap *result = bitmap_git->result;
 	struct bitmap *reuse;
+	struct bitmap *reuse_as_ref_delta = NULL;
 	size_t i;
 	size_t packs_nr = 0, packs_alloc = 0;
 	size_t word_alloc;
@@ -2333,14 +2359,18 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	word_alloc = objects_nr / BITS_IN_EWORD;
 	if (objects_nr % BITS_IN_EWORD)
 		word_alloc++;
+
 	reuse = bitmap_word_alloc(word_alloc);
+	reuse_as_ref_delta = bitmap_word_alloc(word_alloc);
 
 	for (i = 0; i < packs_nr; i++)
-		reuse_partial_packfile_from_bitmap_1(bitmap_git, &packs[i], reuse);
+		reuse_partial_packfile_from_bitmap_1(bitmap_git, &packs[i],
+						     reuse, reuse_as_ref_delta);
 
 	if (bitmap_is_empty(reuse)) {
 		free(packs);
 		bitmap_free(reuse);
+		bitmap_free(reuse_as_ref_delta);
 		return;
 	}
 
@@ -2352,6 +2382,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	*packs_out = packs;
 	*packs_nr_out = packs_nr;
 	*reuse_out = reuse;
+	*reuse_as_ref_delta_out = reuse_as_ref_delta;
 }
 
 int bitmap_walk_contains(struct bitmap_index *bitmap_git,
diff --git a/pack-bitmap.h b/pack-bitmap.h
index a1e8c8936c9..e7962ac90e8 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -86,6 +86,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 					struct bitmapped_pack **packs_out,
 					size_t *packs_nr_out,
 					struct bitmap **reuse_out,
+					struct bitmap **reuse_as_ref_delta_out,
 					int multi_pack_reuse);
 int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping,
 			     kh_oid_map_t *reused_bitmaps, int show_progress);
diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh
index 8bcb736c75a..6edff02d124 100755
--- a/t/t5332-multi-pack-reuse.sh
+++ b/t/t5332-multi-pack-reuse.sh
@@ -194,7 +194,7 @@ test_expect_success 'omit delta with uninteresting base (same pack)' '
 	test_pack_objects_reused 3 1 <in
 '
 
-test_expect_success 'omit delta from uninteresting base (cross pack)' '
+test_expect_success 'can retain delta from uninteresting base (cross pack)' '
 	cat >in <<-EOF &&
 	$(git rev-parse $base)
 	^$(git rev-parse $delta)
@@ -207,7 +207,7 @@ test_expect_success 'omit delta from uninteresting base (cross pack)' '
 	packs_nr="$(find $packdir -type f -name "pack-*.pack" | wc -l)" &&
 	objects_nr="$(git rev-list --count --all --objects)" &&
 
-	test_pack_objects_reused_all $(($objects_nr - 1)) $packs_nr
+	test_pack_objects_reused_all $objects_nr $packs_nr
 '
 
 test_expect_success 'non-omitted delta in MIDX preferred pack' '
-- 
2.47.0.11.g487258bca34





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux