[PATCH v2 3/3] receive-pack: only use visible refs for connectivity check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When serving a push, git-receive-pack(1) needs to verify that the
packfile sent by the client contains all objects that are required by
the updated references. This connectivity check works by marking all
preexisting references as uninteresting and using the new reference tips
as starting point for a graph walk.

Marking all preexisting references as uninteresting can be a problem
when it comes to performance. Git forges tend to do internal bookkeeping
to keep alive sets of objects for internal use or make them easy to find
via certain references. These references are typically hidden away from
the user so that they are neither advertised nor writeable. At GitLab,
we have one particular repository that contains a total of 7 million
references, of which 6.8 million are indeed internal references. With
the current connectivity check we are forced to load all these
references in order to mark them as uninteresting, and this alone takes
around 15 seconds to compute.

We can optimize this by only taking into account the set of visible refs
when marking objects as uninteresting. This means that we may now walk
more objects until we hit any object that is marked as uninteresting.
But it is rather unlikely that clients send objects that make large
parts of objects reachable that have previously only ever been hidden,
whereas the common case is to push incremental changes that build on top
of the visible object graph.

This provides a huge boost to performance in the mentioned repository,
where the vast majority of its refs hidden. Pushing a new commit into
this repo with `transfer.hideRefs` set up to hide 6.8 million of 7 refs
as it is configured in Gitaly leads to an almost 4.5-fold speedup:

    Benchmark 1: main
      Time (mean ± σ):     29.475 s ±  0.248 s    [User: 28.812 s, System: 1.006 s]
      Range (min … max):   29.189 s … 29.636 s    3 runs

    Benchmark 2: pks-connectivity-check-hide-refs
      Time (mean ± σ):      6.657 s ±  0.027 s    [User: 6.664 s, System: 0.355 s]
      Range (min … max):    6.629 s …  6.682 s    3 runs

    Summary
      'pks-connectivity-check-hide-refs' ran
        4.43 ± 0.04 times faster than 'main'

As we mostly go through the same codepaths even in the case where there
are no hidden refs at all compared to the code before there is no change
in performance when no refs are hidden:

    Benchmark 1: main
      Time (mean ± σ):     48.688 s ±  1.535 s    [User: 49.579 s, System: 5.058 s]
      Range (min … max):   47.518 s … 50.425 s    3 runs

    Benchmark 2: pks-connectivity-check-hide-refs
      Time (mean ± σ):     48.147 s ±  0.716 s    [User: 48.800 s, System: 5.297 s]
      Range (min … max):   47.694 s … 48.973 s    3 runs

    Summary
      'pks-connectivity-check-hide-refs' ran
        1.01 ± 0.04 times faster than 'main'

Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
---
 builtin/receive-pack.c | 2 ++
 connected.c            | 5 ++++-
 connected.h            | 6 ++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 1f3efc58fb..be1c1d2702 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1929,6 +1929,8 @@ static void execute_commands(struct command *commands,
 	opt.err_fd = err_fd;
 	opt.progress = err_fd && !quiet;
 	opt.env = tmp_objdir_env(tmp_objdir);
+	opt.visible_refs_section = "receive";
+
 	if (check_connected(iterate_receive_command_list, &data, &opt))
 		set_connectivity_errors(commands, si);
 
diff --git a/connected.c b/connected.c
index 74a20cb32e..c64501f755 100644
--- a/connected.c
+++ b/connected.c
@@ -100,7 +100,10 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
 		strvec_push(&rev_list.args, "--exclude-promisor-objects");
 	if (!opt->is_deepening_fetch) {
 		strvec_push(&rev_list.args, "--not");
-		strvec_push(&rev_list.args, "--all");
+		if (opt->visible_refs_section)
+			strvec_pushf(&rev_list.args, "--visible-refs=%s", opt->visible_refs_section);
+		else
+			strvec_push(&rev_list.args, "--all");
 	}
 	strvec_push(&rev_list.args, "--quiet");
 	strvec_push(&rev_list.args, "--alternate-refs");
diff --git a/connected.h b/connected.h
index 6e59c92aa3..d8396e5d55 100644
--- a/connected.h
+++ b/connected.h
@@ -46,6 +46,12 @@ struct check_connected_options {
 	 * during a fetch.
 	 */
 	unsigned is_deepening_fetch : 1;
+
+	/*
+	 * If not NULL, use `--visible-refs=$section` instead of `--not --all`
+	 * as the set of already-reachable references.
+	 */
+	const char *visible_refs_section;
 };
 
 #define CHECK_CONNECTED_INIT { 0 }
-- 
2.38.1

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux