[PATCH v2 08/11] receive-pack: use oidset to de-duplicate .have lines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If you have an alternate object store with a very large
number of refs, the peak memory usage of the sha1_array can
grow high, even if most of them are duplicates that end up
not being printed at all.

The similar for_each_alternate_ref() code-paths in
fetch-pack solve this by using flags in "struct object" to
de-duplicate (and so are relying on obj_hash at the core).

But we don't have a "struct object" at all in this case. We
could call lookup_unknown_object() to get one, but if our
goal is reducing memory footprint, it's not great:

 - an unknown object is as large as the largest object type
   (a commit), which is bigger than an oidset entry

 - we can free the memory after our ref advertisement, but
   "struct object" entries persist forever (and the
   receive-pack may hang around for a long time, as the
   bottleneck is often client upload bandwidth).

So let's use an oidset. Note that unlike a sha1-array it
doesn't sort the output as a side effect. However, our
output is at least stable, because for_each_alternate_ref()
will give us the sha1s in ref-sorted order.

In one particularly pathological case with an alternate that
has 60,000 unique refs out of 80 million total, this reduced
the peak heap usage of "git receive-pack . </dev/null" from
13GB to 14MB.

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
 builtin/receive-pack.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index d21332d9e..a4926fcfb 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -21,6 +21,7 @@
 #include "sigchain.h"
 #include "fsck.h"
 #include "tmp-objdir.h"
+#include "oidset.h"
 
 static const char * const receive_pack_usage[] = {
 	N_("git receive-pack <git-dir>"),
@@ -271,27 +272,24 @@ static int show_ref_cb(const char *path_full, const struct object_id *oid,
 	return 0;
 }
 
-static int show_one_alternate_sha1(const unsigned char sha1[20], void *unused)
+static void show_one_alternate_ref(const char *refname,
+				   const struct object_id *oid,
+				   void *data)
 {
-	show_ref(".have", sha1);
-	return 0;
-}
+	struct oidset *seen = data;
 
-static void collect_one_alternate_ref(const char *refname,
-				      const struct object_id *oid,
-				      void *data)
-{
-	struct sha1_array *sa = data;
-	sha1_array_append(sa, oid->hash);
+	if (oidset_insert(seen, oid))
+		return;
+
+	show_ref(".have", oid->hash);
 }
 
 static void write_head_info(void)
 {
-	struct sha1_array sa = SHA1_ARRAY_INIT;
+	static struct oidset seen = OIDSET_INIT;
 
-	for_each_alternate_ref(collect_one_alternate_ref, &sa);
-	sha1_array_for_each_unique(&sa, show_one_alternate_sha1, NULL);
-	sha1_array_clear(&sa);
+	for_each_alternate_ref(show_one_alternate_ref, &seen);
+	oidset_clear(&seen);
 	for_each_ref(show_ref_cb, NULL);
 	if (!sent_capabilities)
 		show_ref("capabilities^{}", null_sha1);
-- 
2.12.0.rc0.371.ga6cf8653b




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]