[PATCH v2 3/3] connected: implement connectivity check using bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The connectivity checks are currently implemented via git-rev-list(1):
we simply ignore any objects which are reachable from preexisting refs
via `--not --all`, and pass all new refs which are to be checked via its
stdin. While this works well enough for repositories which do not have a
lot of references, it's clear that `--not --all` will linearly scale
with the number of refs which do exist: for each reference, we'll walk
through its commit as well as its five parent commits (defined via
`SLOP`). Given that many major hosting sites which use a pull/merge
request workflow keep refs to the request's HEAD, this effectively means
that `--not --all` will do a nearly complete graph walk.

This commit implements an alternate strategy if the target repository
has bitmaps. Objects referenced by a bitmap are by definition always
fully connected, so they form a natural boundary between old revisions
and new revisions. With this alternate strategy, we walk all given new
object IDs. Whenever we hit any object which is covered by the bitmap,
we stop the walk.

The logic only kicks in in case we have a bitmap in the repository. If
not, we wouldn't be able to efficiently abort the walk because we cannot
easily tell whether an object already exists in the target repository
and, if it does, whether it's fully connected. As a result, we'd have to
perform a always do graph walk in this case. Naturally, we could do the
same thing the previous git-rev-list(1) implementation did in that case
and just use preexisting branch tips as boundaries. But for now, we just
keep the old implementation for simplicity's sake given that performance
characteristics are likely not significantly different.

An easier solution may have been to simply add `--use-bitmap-index` to
the git-rev-list(1) invocation, but benchmarks have shown that this is
not effective: performance characteristics remain the same except for
some cases where the bitmap walks performs much worse compared to the
non-bitmap walk

The following benchmarks have been performed in linux.git:

Test                                                  origin/master           HEAD
---------------------------------------------------------------------------------------------------------
5400.4: empty receive-pack updated:new                176.02(387.28+175.12)   176.86(388.75+175.51) +0.5%
5400.7: clone receive-pack updated:new                0.10(0.09+0.01)         0.08(0.06+0.03) -20.0%
5400.9: clone receive-pack updated:main               0.09(0.08+0.01)         0.09(0.06+0.03) +0.0%
5400.11: clone receive-pack main~10:main              0.14(0.11+0.03)         0.13(0.11+0.02) -7.1%
5400.13: clone receive-pack :main                     0.01(0.01+0.00)         0.02(0.01+0.00) +100.0%
5400.16: clone_bitmap receive-pack updated:new        0.10(0.09+0.01)         0.28(0.13+0.15) +180.0%
5400.18: clone_bitmap receive-pack updated:main       0.10(0.08+0.02)         0.28(0.12+0.16) +180.0%
5400.20: clone_bitmap receive-pack main~10:main       0.13(0.11+0.02)         0.26(0.12+0.14) +100.0%
5400.22: clone_bitmap receive-pack :main              0.01(0.01+0.00)         0.01(0.01+0.00) +0.0%
5400.25: extrarefs receive-pack updated:new           32.14(20.76+11.59)      32.35(20.52+12.03) +0.7%
5400.27: extrarefs receive-pack updated:main          32.08(20.54+11.75)      32.10(20.78+11.53) +0.1%
5400.29: extrarefs receive-pack main~10:main          32.14(20.66+11.68)      32.27(20.65+11.83) +0.4%
5400.31: extrarefs receive-pack :main                 7.09(3.56+3.53)         7.10(3.70+3.40) +0.1%
5400.34: extrarefs_bitmap receive-pack updated:new    32.41(20.59+12.02)      7.36(3.76+3.60) -77.3%
5400.36: extrarefs_bitmap receive-pack updated:main   32.26(20.53+11.95)      7.34(3.56+3.78) -77.2%
5400.38: extrarefs_bitmap receive-pack main~10:main   32.44(20.77+11.90)      7.40(3.70+3.70) -77.2%
5400.40: extrarefs_bitmap receive-pack :main          7.09(3.62+3.46)         7.17(3.79+3.38) +1.1%

As expected, performance doesn't change in cases where we do not have a
bitmap available given that the old code path still kicks in. In case we
do have bitmaps, this is kind of a mixed bag: while git-receive-pack(1)
is slower in a "normal" clone of linux.git, it is significantly faster
for a clone with lots of references. The slowness can potentially be
explained by the overhead of loading the bitmap. On the other hand, the
new code is faster as expected in repos which have lots of references
given that we do not have to mark all negative references anymore.

Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
---
 connected.c   | 136 ++++++++++++++++++++++++++++++++++++++++++++++++++
 pack-bitmap.c |   4 +-
 pack-bitmap.h |   2 +
 3 files changed, 140 insertions(+), 2 deletions(-)

diff --git a/connected.c b/connected.c
index b18299fdf0..474275a52f 100644
--- a/connected.c
+++ b/connected.c
@@ -6,6 +6,127 @@
 #include "transport.h"
 #include "packfile.h"
 #include "promisor-remote.h"
+#include "object.h"
+#include "tree-walk.h"
+#include "commit.h"
+#include "tag.h"
+#include "progress.h"
+#include "oidset.h"
+#include "pack-bitmap.h"
+
+#define QUEUED (1u<<0)
+
+static int queue_object(struct repository *repo,
+			struct bitmap_index *bitmap,
+			struct object_array *pending,
+			const struct object_id *oid)
+{
+	struct object *object;
+
+	/*
+	 * If the object ID is part of the bitmap, then we know that it must by
+	 * definition be reachable in the target repository and be fully
+	 * connected. We can thus skip checking the objects' referenced
+	 * objects.
+	 */
+	if (bitmap_position(bitmap, oid) >= 0)
+		return 0;
+
+	/* Otherwise the object is queued up for a connectivity check. */
+	object = parse_object(repo, oid);
+	if (!object) {
+		/* Promisor objects do not need to be traversed. */
+		if (is_promisor_object(oid))
+			return 0;
+		return -1;
+	}
+
+	/*
+	 * If the object has been queued before already, then we don't need to
+	 * do so again.
+	 */
+	if (object->flags & QUEUED)
+		return 0;
+	object->flags |= QUEUED;
+
+	/*
+	 * We do not need to queue up blobs given that they don't reference any
+	 * other objects anyway.
+	 */
+	if (object->type == OBJ_BLOB)
+		return 0;
+
+	add_object_array(object, NULL, pending);
+
+	return 0;
+}
+
+static int check_object(
+	struct repository *repo,
+	struct bitmap_index *bitmap,
+	struct object_array *pending,
+	const struct object *object)
+{
+	int ret = 0;
+
+	if (object->type == OBJ_TREE) {
+		struct tree *tree = (struct tree *)object;
+		struct tree_desc tree_desc;
+		struct name_entry entry;
+
+		if (init_tree_desc_gently(&tree_desc, tree->buffer, tree->size))
+			return error(_("cannot parse tree entry"));
+		while (tree_entry_gently(&tree_desc, &entry))
+			ret |= queue_object(repo, bitmap, pending, &entry.oid);
+
+		free_tree_buffer(tree);
+	} else if (object->type == OBJ_COMMIT) {
+		struct commit *commit = (struct commit *) object;
+		struct commit_list *parents;
+
+		for (parents = commit->parents; parents; parents = parents->next)
+			ret |= queue_object(repo, bitmap, pending, &parents->item->object.oid);
+
+		free_commit_buffer(repo->parsed_objects, commit);
+	} else if (object->type == OBJ_TAG) {
+		ret |= queue_object(repo, bitmap, pending, get_tagged_oid((struct tag *) object));
+	} else {
+		return error(_("unexpected object type"));
+	}
+
+	return ret;
+}
+
+/*
+ * If the target repository has a bitmap, then we treat all objects reachable
+ * via the bitmap as fully connected. Bitmapped objects thus serve as the
+ * boundary between old and new objects.
+ */
+static int check_connected_with_bitmap(struct repository *repo,
+				       struct bitmap_index *bitmap,
+				       oid_iterate_fn fn, void *cb_data,
+				       struct check_connected_options *opt)
+{
+	struct object_array pending = OBJECT_ARRAY_INIT;
+	struct progress *progress = NULL;
+	size_t checked_objects = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	if (opt->progress)
+		progress = start_delayed_progress("Checking connectivity", 0);
+
+	while (!fn(cb_data, &oid))
+		ret |= queue_object(repo, bitmap, &pending, &oid);
+	while (pending.nr) {
+		ret |= check_object(repo, bitmap, &pending, object_array_pop(&pending));
+		display_progress(progress, ++checked_objects);
+	}
+
+	stop_progress(&progress);
+	object_array_clear(&pending);
+	return ret;
+}
 
 /*
  * If we feed all the commits we want to verify to this command
@@ -28,12 +149,27 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
 	int err = 0;
 	struct packed_git *new_pack = NULL;
 	struct transport *transport;
+	struct bitmap_index *bitmap;
 	size_t base_len;
 
 	if (!opt)
 		opt = &defaults;
 	transport = opt->transport;
 
+	bitmap = prepare_bitmap_git(the_repository);
+	if (bitmap) {
+		/*
+		 * If we have a bitmap, then we can reuse the bitmap as
+		 * boundary between old and new objects.
+		 */
+		err = check_connected_with_bitmap(the_repository, bitmap,
+						  fn, cb_data, opt);
+		if (opt->err_fd)
+			close(opt->err_fd);
+		free_bitmap_index(bitmap);
+		return err;
+	}
+
 	if (fn(cb_data, &oid)) {
 		if (opt->err_fd)
 			close(opt->err_fd);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index d90e1d9d8c..d88a882ee1 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -418,8 +418,8 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git,
 	return pos;
 }
 
-static int bitmap_position(struct bitmap_index *bitmap_git,
-			   const struct object_id *oid)
+int bitmap_position(struct bitmap_index *bitmap_git,
+		    const struct object_id *oid)
 {
 	int pos = bitmap_position_packfile(bitmap_git, oid);
 	return (pos >= 0) ? pos : bitmap_position_extended(bitmap_git, oid);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 99d733eb26..7b4b386107 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -63,6 +63,8 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping
 void free_bitmap_index(struct bitmap_index *);
 int bitmap_walk_contains(struct bitmap_index *,
 			 struct bitmap *bitmap, const struct object_id *oid);
+int bitmap_position(struct bitmap_index *bitmap_git,
+		    const struct object_id *oid);
 
 /*
  * After a traversal has been performed by prepare_bitmap_walk(), this can be
-- 
2.32.0

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux