Re: [PATCH 2/2] fsck: use oidset for skiplist

René Scharfe <l.s.r@xxxxxx> · Mon, 13 Aug 2018 19:15:19 +0200

Am 11.08.2018 um 19:23 schrieb Jeff King:
On Sat, Aug 11, 2018 at 01:02:48PM -0400, Jeff King wrote:

   - we could probably improve the speed of oidset. Two things I notice
     about its implementation:

       - it has to malloc for each entry, which I suspect is the main
	bottleneck. We could probably pool-allocate blocks, and when
	entries get removed just leave the allocations in place until we
	clear(). Most callers tend to build up a set and then query it a
	lot, or possibly remove items from the set until it's empty. But
	my guess is that few or none want a long-lived set that they add
	and remove from randomly.

       - insertion lets you do check-and-insert as a single operation
	(something I failed to notice in [1]). But it's not implemented
	as efficiently as it could be, since the "contains" and "put"
	operations do separate lookups. This doesn't matter for a set
	that's queried a lot more, but for something like de-duping
	(like I was doing in [1]) most operations are check-and-insert.
[...]
[1] https://public-inbox.org/git/20180810232457.GG19875@xxxxxxxxxxxxxxxxxxxxx/
     but note that it's buried pretty deep.

Some notes on this, based on the cat-file patch that I linked to.

Before any optimizations, my best-of-five timing for:

   git cat-file --batch-all-objects --unordered --buffer \
                --batch-check='%(objectname)' >/dev/null

in git.git was:

   real	0m0.434s
   user	0m0.414s
   sys	0m0.020s

That's enumerating every object in the repo but not doing much more than
de-duping the names and printing them.

Applying this patch:

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 45992c9be9..04b5cda191 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -443,9 +443,8 @@ static int batch_unordered_object(const struct object_id *oid, void *vdata)
  {
  	struct object_cb_data *data = vdata;
  
-	if (oidset_contains(data->seen, oid))
+	if (oidset_insert(data->seen, oid))
  		return 0;
-	oidset_insert(data->seen, oid);
  
  	return batch_object_cb(oid, data);
  }

to use the single-call set-and-replace doesn't seem to help (any
improvement is within the run-to-run noise). So a single hash lookup per
object does not seem to be measurable. And thus teaching oidset_insert()
to do a single hash lookup for check-and-insert is unlikely to help us.

On top of that, I tried using a pool to store the set entries:

diff --git a/oidset.c b/oidset.c
index 454c54f933..504929f177 100644
--- a/oidset.c
+++ b/oidset.c
@@ -17,7 +17,10 @@ int oidset_insert(struct oidset *set, const struct object_id *oid)
  	else if (oidset_contains(set, oid))
  		return 1;
  
-	entry = xmalloc(sizeof(*entry));
+	if (!set->pool)
+		mem_pool_init(&set->pool, 0);
+
+	entry = mem_pool_alloc(set->pool, sizeof(*entry));
  	oidcpy(&entry->oid, oid);
  
  	oidmap_put(&set->map, entry);
@@ -29,12 +32,13 @@ int oidset_remove(struct oidset *set, const struct object_id *oid)
  	struct oidmap_entry *entry;
  
  	entry = oidmap_remove(&set->map, oid);
-	free(entry);
+	/* abandon pool memory for "entry" */
  
  	return (entry != NULL);
  }
  
  void oidset_clear(struct oidset *set)
  {
-	oidmap_free(&set->map, 1);
+	oidmap_free(&set->map, 0);
+	mem_pool_discard(set->pool, 0);
  }
diff --git a/oidset.h b/oidset.h
index 40ec5f87fe..6b8b802987 100644
--- a/oidset.h
+++ b/oidset.h
@@ -20,6 +20,7 @@
   */
  struct oidset {
  	struct oidmap map;
+	struct mem_pool *pool;
  };
  
  #define OIDSET_INIT { OIDMAP_INIT }

That drops my best-of-five to:

   real	0m0.300s
   user	0m0.288s
   sys	0m0.012s

which is over a 25% speedup. So that does seem worth pursuing.

For reference, the oid_array code path for cat-file is still:

   real	0m0.161s
   user	0m0.157s
   sys	0m0.004s

but that's not completely apples to apples. The oidset code is also
iterating the packfiles in a different order and generating a revidx
(which I know is about 25ms in this repo). So a better test would
actually swap one data structure out for the other with no other changes
(I just happened to have this test handy, and really wanted to know
whether the mem_pool stuff would help).

Getting sidetracked here, but the following patch helps both sides a bit:

-- >8 --
Subject: [PATCH] cat-file: reuse strbuf in batch_object_write()

Avoid allocating and then releasing memory for constructing the output
for each object by reusing the strbuf for all of them.

Signed-off-by: Rene Scharfe <l.s.r@xxxxxx>
---
# on git.git
$ hyperfine "./git-cat-file --batch-all-objects --buffer --batch-check='%(objectname)'"

Before:
Benchmark #1: ./git-cat-file --batch-all-objects --buffer --batch-check='%(objectname)'

  Time (mean ± σ):     139.3 ms ±  20.1 ms    [User: 124.2 ms, System: 14.8 ms]

  Range (min … max):   124.4 ms … 205.9 ms

After:
Benchmark #1: ./git-cat-file --batch-all-objects --buffer --batch-check='%(objectname)'

  Time (mean ± σ):     115.1 ms ±  20.6 ms    [User: 102.0 ms, System: 12.8 ms]

  Range (min … max):    99.6 ms … 198.1 ms

Test done one a small VM -- the measurements are quite noisy.

 builtin/cat-file.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 4a44b2404f..a979fc1f3a 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -211,6 +211,12 @@ struct expand_data {
 	 * optimized out.
 	 */
 	unsigned skip_object_info : 1;
+
+	/*
+	 * Scratch space for expanded string; shared between invocations
+	 * to reduce the number of memory allocations.
+	 */
+	struct strbuf *scratch;
 };
 
 static int is_atom(const char *atom, const char *s, int slen)
@@ -340,8 +346,6 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 static void batch_object_write(const char *obj_name, struct batch_options *opt,
 			       struct expand_data *data)
 {
-	struct strbuf buf = STRBUF_INIT;
-
 	if (!data->skip_object_info &&
 	    oid_object_info_extended(the_repository, &data->oid, &data->info,
 				     OBJECT_INFO_LOOKUP_REPLACE) < 0) {
@@ -351,10 +355,10 @@ static void batch_object_write(const char *obj_name, struct batch_options *opt,
 		return;
 	}
 
-	strbuf_expand(&buf, opt->format, expand_format, data);
-	strbuf_addch(&buf, '\n');
-	batch_write(opt, buf.buf, buf.len);
-	strbuf_release(&buf);
+	strbuf_reset(data->scratch);
+	strbuf_expand(data->scratch, opt->format, expand_format, data);
+	strbuf_addch(data->scratch, '\n');
+	batch_write(opt, data->scratch->buf, data->scratch->len);
 
 	if (opt->print_contents) {
 		print_object_or_die(opt, data);
@@ -453,6 +457,7 @@ static int batch_objects(struct batch_options *opt)
 	 * object.
 	 */
 	memset(&data, 0, sizeof(data));
+	data.scratch = &buf;
 	data.mark_query = 1;
 	strbuf_expand(&buf, opt->format, expand_format, &data);
 	data.mark_query = 0;
--
2.18.0