[PATCH] pack-objects: handle island check for "external" delta base

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 14, 2018 at 02:56:36PM -0700, Junio C Hamano wrote:

> * cc/delta-islands (2018-08-16) 7 commits
>   (merged to 'next' on 2018-08-27 at cf3d7bd93f)
>  + pack-objects: move 'layer' into 'struct packing_data'
>  + pack-objects: move tree_depth into 'struct packing_data'
>  + t5320: tests for delta islands
>  + repack: add delta-islands support
>  + pack-objects: add delta-islands support
>  + pack-objects: refactor code into compute_layer_order()
>  + Add delta-islands.{c,h}
> 
>  Lift code from GitHub to restrict delta computation so that an
>  object that exists in one fork is not made into a delta against
>  another object that does not appear in the same forked repository.
> 
>  Will merge to 'master'.

This needed some conflict resolution with my pack-bitmap-reuse-delta
topic, but there's a subtle bug in the result that went to 'master'.
Details and a fix below.

As a side note, I did this same resolution myself at least twice (for my
personal build and for porting the refreshed delta-reuse series to our
GitHub build), and I wrote the exact same resolution you did both times.
So I think it was an easy mistake to make. :)

-Peff

-- >8 --
Subject: pack-objects: handle island check for "external" delta base

Two recent topics, jk/pack-delta-reuse-with-bitmap and
cc/delta-islands, can have a funny interaction. When
checking if we can reuse an on-disk delta, the first topic
allows base_entry to be NULL when we find an object that's
not in the packing list. But the latter topic introduces a
call to in_same_island(), which needs to look at
base_entry->idx.oid. When these two features are used
together, we might try to dereference a NULL base_entry.

In practice, this doesn't really happen. We'd generally only
use delta islands when packing to disk, since the whole
point is to optimize the pack for serving fetches later. And
the new delta-reuse code relies on having used reachability
bitmaps to determine the set of objects, which we would
typically only do when serving an actual fetch.

However, it is technically possible to combine these
features. And even without doing so, building with
"SANITIZE=address,undefined" will cause t5310.46 to
complain.  Even though that test does not have delta islands
enabled, we still take the address of the NULL entry to pass
to in_same_island(). That function then promptly returns
without dereferencing the value when it sees that islands
are not enabled, but it's enough to trigger a sanitizer
error.

The solution is straight-forward: when both features are
used together, we should pass the oid of the found base to
in_same_island().

This is tricky to do inside a single "if" statement. And
after the merge in f3504ea3dd (Merge branch
'cc/delta-islands', 2018-09-17), that "if" condition is
already getting pretty unwieldy. So this patch moves the
logic into a helper function, where we can easily use
multiple return paths. The result is a bit longer, but the
logic should be much easier to follow.

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
 builtin/pack-objects.c | 68 ++++++++++++++++++++++++++++++++----------
 1 file changed, 52 insertions(+), 16 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5041818ddf..27cb674124 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1470,6 +1470,57 @@ static void cleanup_preferred_base(void)
 	done_pbase_paths_num = done_pbase_paths_alloc = 0;
 }
 
+/*
+ * Return 1 iff the object specified by "delta" can be sent
+ * literally as a delta against the base in "base_sha1". If
+ * so, then *base_out will point to the entry in our packing
+ * list, or NULL if we must use the external-base list.
+ *
+ * Depth value does not matter - find_deltas() will
+ * never consider reused delta as the base object to
+ * deltify other objects against, in order to avoid
+ * circular deltas.
+ */
+static int can_reuse_delta(const unsigned char *base_sha1,
+			   struct object_entry *delta,
+			   struct object_entry **base_out)
+{
+	struct object_entry *base;
+
+	if (!base_sha1)
+		return 0;
+
+	/*
+	 * First see if we're already sending the base (or it's explicitly in
+	 * our "excluded" list.
+	 */
+	base = packlist_find(&to_pack, base_sha1, NULL);
+	if (base) {
+		if (!in_same_island(&delta->idx.oid, &base->idx.oid))
+			return 0;
+		*base_out = base;
+		return 1;
+	}
+
+	/*
+	 * Otherwise, reachability bitmaps may tell us if the receiver has it,
+	 * even if it was buried too deep in history to make it into the
+	 * packing list.
+	 */
+	if (thin && bitmap_has_sha1_in_uninteresting(bitmap_git, base_sha1)) {
+		if (use_delta_islands) {
+			struct object_id base_oid;
+			hashcpy(base_oid.hash, base_sha1);
+			if (!in_same_island(&delta->idx.oid, &base_oid))
+				return 0;
+		}
+		*base_out = NULL;
+		return 1;
+	}
+
+	return 0;
+}
+
 static void check_object(struct object_entry *entry)
 {
 	unsigned long canonical_size;
@@ -1556,22 +1607,7 @@ static void check_object(struct object_entry *entry)
 			break;
 		}
 
-		if (base_ref && (
-		    (base_entry = packlist_find(&to_pack, base_ref, NULL)) ||
-		    (thin &&
-		     bitmap_has_sha1_in_uninteresting(bitmap_git, base_ref))) &&
-		    in_same_island(&entry->idx.oid, &base_entry->idx.oid)) {
-			/*
-			 * If base_ref was set above that means we wish to
-			 * reuse delta data, and either we found that object in
-			 * the list of objects we want to pack, or it's one we
-			 * know the receiver has.
-			 *
-			 * Depth value does not matter - find_deltas() will
-			 * never consider reused delta as the base object to
-			 * deltify other objects against, in order to avoid
-			 * circular deltas.
-			 */
+		if (can_reuse_delta(base_ref, entry, &base_entry)) {
 			oe_set_type(entry, entry->in_pack_type);
 			SET_SIZE(entry, in_pack_size); /* delta size */
 			SET_DELTA_SIZE(entry, in_pack_size);
-- 
2.19.0.745.g75ede3edf3




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux