[PATCH v2 00/10] cat-file speedups

Eric Wong <e@xxxxxxxxx> · Fri, 23 Aug 2024 22:46:20 +0000

This continues the work of Jeff King and my initial work to
speed up cat-file --batch(-contents)? users in
https://lore.kernel.org/git/20240621062915.GA2105230@xxxxxxxxxxxxxxxxxxxxxxx/T/

v1 is here:
https://lore.kernel.org/git/20240715003519.2671385-1-e@xxxxxxxxx/T/

v2 changes:

- attempts to improve various commit messages
  (the human language part of my brain has been pretty broken
  for a few years, now :<)
- expand comments around delta_base_cache_lock
- remove DIRECT_CACHE knob since it's always on
- move `else' arm removal in print_object_or_die from 7/10 to
  8/10 to fix t1006 under bisect

I've kept the assert() calls rather than using BUG() since they're
in easily tested code paths and the tests they perform aren't
useful in release builds.  The assertions should remain useful for
future development if we introduce more caching.

Thanks to Patrick for reviewing v1 and Jeff for the
content_limit work.

Eric Wong (8):
  packfile: fix off-by-one in content_limit comparison
  packfile: inline cache_or_unpack_entry
  cat-file: use delta_base_cache entries directly
  packfile: packed_object_info avoids packed_to_object_type
  object_info: content_limit only applies to blobs
  cat-file: batch-command uses content_limit
  cat-file: batch_write: use size_t for length
  cat-file: use writev(2) if available

Jeff King (2):
  packfile: move sizep computation
  packfile: allow content-limit for cat-file

 Makefile            |   3 ++
 builtin/cat-file.c  | 124 +++++++++++++++++++++++++++++++-------------
 config.mak.uname    |   5 ++
 git-compat-util.h   |  10 ++++
 object-file.c       |  12 +++++
 object-store-ll.h   |   9 ++++
 packfile.c          | 122 ++++++++++++++++++++++++++++---------------
 packfile.h          |   4 ++
 t/t1006-cat-file.sh |  19 +++++--
 wrapper.c           |  18 +++++++
 wrapper.h           |   1 +
 write-or-die.c      |  66 +++++++++++++++++++++++
 write-or-die.h      |   2 +
 13 files changed, 312 insertions(+), 83 deletions(-)

Range-diff against v1:
 1:  36b799ab67 !  1:  b4025cee1f packfile: move sizep computation
    @@ Metadata
      ## Commit message ##
         packfile: move sizep computation

    -    This makes the next commit to avoid redundant object info
    -    lookups easier to understand.
    +    Moving the sizep computation now makes the next commit to avoid
    +    redundant object info lookups easier to understand.  There is
    +    no user-visible change, here.

         [ew: commit message]

    @@ Commit message

      ## packfile.c ##
     @@ packfile.c: int packed_object_info(struct repository *r, struct packed_git *p,
    - 
    - 	/*
    - 	 * We always get the representation type, but only convert it to
    --	 * a "real" type later if the caller is interested.
    -+	 * a "real" type later if the caller is interested. Likewise...
    -+	 * tbd.
    - 	 */
    - 	if (oi->contentp) {
    - 		*oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
    -@@ packfile.c: int packed_object_info(struct repository *r, struct packed_git *p,
      			type = OBJ_BAD;
      	} else {
      		type = unpack_object_header(p, &w_curs, &curpos, &size);
 2:  50f576ab16 !  2:  bdf6f57fae packfile: allow content-limit for cat-file
    @@ Metadata
      ## Commit message ##
         packfile: allow content-limit for cat-file

    -    This avoids unnecessary round trips to the object store to speed
    +    Avoid unnecessary round trips to the object store to speed
         up cat-file contents retrievals.  The majority of packed objects
         don't benefit from the streaming interface at all and we end up
         having to load them in core anyways to satisfy our streaming
         API.

         This drops the runtime of
    -    `git cat-file --batch-all-objects --unordered --batch' from
    -    ~7.1s to ~6.1s on Jeff's machine.
    +    `git cat-file --batch-all-objects --unordered --batch' on
    +    git.git from ~7.1s to ~6.1s on Jeff's machine.

         [ew: commit message]

    @@ object-store-ll.h: struct object_info {

      ## packfile.c ##
     @@ packfile.c: int packed_object_info(struct repository *r, struct packed_git *p,
    - 	 * a "real" type later if the caller is interested. Likewise...
    - 	 * tbd.
    + 	 * We always get the representation type, but only convert it to
    + 	 * a "real" type later if the caller is interested.
      	 */
     -	if (oi->contentp) {
     +	if (oi->contentp && !oi->content_limit) {
 3:  6eb732401a !  3:  7e762e3481 packfile: fix off-by-one in content_limit comparison
    @@ Commit message
         slurping objects which match loose object handling and slurp
         objects with size matching the content_limit exactly.

    +    This change is merely for consistency with the majority of
    +    existing code and there is no user visible change in nearly all
    +    cases.  The only exception being the corner case when the object
    +    size matches content_limit exactly where users will see a
    +    speedup from avoiding an extra lookup.
    +
         Signed-off-by: Eric Wong <e@xxxxxxxxx>

      ## packfile.c ##
 4:  9476824ac7 !  4:  a558101b85 packfile: inline cache_or_unpack_entry
    @@ Commit message
         packfile: inline cache_or_unpack_entry

         We need to check delta_base_cache anyways to fill in the
    -    `whence' field in `struct object_info'.  Inlining
    -    cache_or_unpack_entry() makes it easier to only do the hashmap
    -    lookup once and avoid a redundant lookup later on.
    +    `whence' field in `struct object_info'.  Inlining (and getting
    +    rid of) cache_or_unpack_entry() makes it easier to only do the
    +    hashmap lookup once and avoid a redundant lookup later on.

         This code reorganization will also make an optimization to
    -    use the cache entry directly easier to implement.
    +    use the cache entry directly easier to implement in the next
    +    commit.

         Signed-off-by: Eric Wong <e@xxxxxxxxx>

    @@ packfile.c: int packed_object_info(struct repository *r, struct packed_git *p,

      	/*
      	 * We always get the representation type, but only convert it to
    - 	 * a "real" type later if the caller is interested. Likewise...
    - 	 * tbd.
    + 	 * a "real" type later if the caller is interested.
      	 */
     -	if (oi->contentp && !oi->content_limit) {
     -		*oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
 5:  c99dfb84d4 !  5:  74d21ac89d cat-file: use delta_base_cache entries directly
    @@ Commit message
         cat-file: use delta_base_cache entries directly

         For objects already in the delta_base_cache, we can safely use
    -    them directly to avoid the malloc+memcpy+free overhead.
    +    one entry at-a-time directly to avoid the malloc+memcpy+free
    +    overhead.  For a 1MB delta base object, this eliminates the
    +    speed penalty of duplicating large objects into memory and
    +    speeds up those 1MB delta base cached content retrievals by
    +    roughly 30%.

         While only 2-7% of objects are delta bases in repos I've looked
         at, this avoids up to 96MB of duplicated memory in the worst
    -    case with the default git config.  For a more reasonable 1MB
    -    delta base object, this eliminates the speed penalty of
    -    duplicating large objects into memory and speeds up those 1MB
    -    delta base cached content retrievals by roughly 30%.
    +    case with the default git config.

         The new delta_base_cache_lock is a simple single-threaded
    -    assertion to ensure cat-file is the exclusive user of the
    -    delta_base_cache.
    +    assertion to ensure cat-file (and similar) is the exclusive user
    +    of the delta_base_cache.  In other words, we cannot have diff
    +    or similar commands using two or more entries directly from the
    +    delta base cache.  The new lock has nothing to do with parallel
    +    access via multiple threads at the moment.

         Signed-off-by: Eric Wong <e@xxxxxxxxx>

      ## builtin/cat-file.c ##
    -@@
    - #include "promisor-remote.h"
    - #include "mailmap.h"
    - #include "write-or-die.h"
    -+#define USE_DIRECT_CACHE 1
    - 
    - enum batch_mode {
    - 	BATCH_MODE_CONTENTS,
     @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, struct expand_data *d

      	if (data->content) {
      		batch_write(opt, data->content, data->size);
     -		FREE_AND_NULL(data->content);
     +		switch (data->info.whence) {
    -+		case OI_CACHED: BUG("FIXME OI_CACHED support not done");
    ++		case OI_CACHED:
    ++			/*
    ++			 * only blame uses OI_CACHED atm, so it's unlikely
    ++			 * we'll ever hit this path
    ++			 */
    ++			BUG("TODO OI_CACHED support not done");
     +		case OI_LOOSE:
     +		case OI_PACKED:
     +			FREE_AND_NULL(data->content);
     +			break;
     +		case OI_DBCACHED:
    -+			if (USE_DIRECT_CACHE)
    -+				unlock_delta_base_cache();
    -+			else
    -+				FREE_AND_NULL(data->content);
    ++			unlock_delta_base_cache();
     +		}
      	} else if (data->type == OBJ_BLOB) {
      		if (opt->buffer_output)
    @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
      			data.info.sizep = &data.size;
      			data.info.contentp = &data.content;
      			data.info.content_limit = big_file_threshold;
    -+			data.info.direct_cache = USE_DIRECT_CACHE;
    ++			data.info.direct_cache = 1;
      		}
      	}

    @@ object-store-ll.h: struct object_info {
      	} whence;
     +
     +	/*
    -+	 * set if caller is able to use OI_DBCACHED entries without copying
    -+	 * TODO OI_CACHED if its use goes beyond blame
    ++	 * Set if caller is able to use OI_DBCACHED entries without copying.
    ++	 * This only applies to OI_DBCACHED entries at the moment,
    ++	 * not OI_CACHED or any other type of entry.
     +	 */
     +	unsigned direct_cache:1;
     +
    @@ packfile.c: static enum object_type packed_to_object_type(struct repository *r,
      static struct hashmap delta_base_cache;
      static size_t delta_base_cached;

    -+/* ensures oi->direct_cache is used properly */
    ++/*
    ++ * Ensures only a single object is used at-a-time via oi->direct_cache.
    ++ * Using two objects directly at once (e.g. diff) would cause corruption
    ++ * since populating the cache may invalidate existing entries.
    ++ * This lock has nothing to do with parallelism at the moment.
    ++ */
     +static int delta_base_cache_lock;
     +
      static LIST_HEAD(delta_base_cache_lru);
 6:  79a84221b2 !  6:  83b6367950 packfile: packed_object_info avoids packed_to_object_type
    @@ Metadata
      ## Commit message ##
         packfile: packed_object_info avoids packed_to_object_type

    -    For calls the delta base cache, packed_to_object_type calls
    +    For entries in the delta base cache, packed_to_object_type calls
         can be omitted.  This prepares us to bypass content_limit for
         non-blob types in the following commit.

 7:  63b36d759d !  7:  7e0f8c0cf6 object_info: content_limit only applies to blobs
    @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, s
     +					data->type == OBJ_TAG)) {
     +			size_t s = size;
     +
    -+			if (USE_DIRECT_CACHE &&
    -+					data->info.whence == OI_DBCACHED) {
    ++			if (data->info.whence == OI_DBCACHED) {
     +				content = xmemdupz(content, s);
     +				data->info.whence = OI_PACKED;
     +			}
    @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, s
     +
     +		batch_write(opt, content, size);
      		switch (data->info.whence) {
    - 		case OI_CACHED: BUG("FIXME OI_CACHED support not done");
    + 		case OI_CACHED:
    + 			/*
    +@@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, struct expand_data *d
    + 			BUG("TODO OI_CACHED support not done");
      		case OI_LOOSE:
      		case OI_PACKED:
     -			FREE_AND_NULL(data->content);
     +			free(content);
      			break;
      		case OI_DBCACHED:
    - 			if (USE_DIRECT_CACHE)
    - 				unlock_delta_base_cache();
    - 			else
    --				FREE_AND_NULL(data->content);
    -+				free(content);
    - 		}
    --	} else if (data->type == OBJ_BLOB) {
    -+	} else {
    -+		assert(data->type == OBJ_BLOB);
    - 		if (opt->buffer_output)
    - 			fflush(stdout);
    - 		if (opt->transform_mode) {
    -@@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, struct expand_data *d
    - 			stream_blob(oid);
    - 		}
    - 	}
    --	else {
    --		enum object_type type;
    --		unsigned long size;
    --		void *contents;
    --
    --		contents = repo_read_object_file(the_repository, oid, &type,
    --						 &size);
    --		if (!contents)
    --			die("object %s disappeared", oid_to_hex(oid));
    --
    --		if (use_mailmap) {
    --			size_t s = size;
    --			contents = replace_idents_using_mailmap(contents, &s);
    --			size = cast_size_t_to_ulong(s);
    --		}
    --
    --		if (type != data->type)
    --			die("object %s changed type!?", oid_to_hex(oid));
    --		if (data->info.sizep && size != data->size && !use_mailmap)
    --			die("object %s changed size!?", oid_to_hex(oid));
    --
    --		batch_write(opt, contents, size);
    --		free(contents);
    --	}
    - }
    - 
    - static void print_default_format(struct strbuf *scratch, struct expand_data *data,
    + 			unlock_delta_base_cache();

      ## object-file.c ##
     @@ object-file.c: static int loose_object_info(struct repository *r,
    @@ packfile.c: int packed_object_info(struct repository *r, struct packed_git *p,
      				if (!*oi->contentp)
      					type = OBJ_BAD;
      			} else {
    +
    + ## t/t1006-cat-file.sh ##
    +@@ t/t1006-cat-file.sh: test_expect_success 'confirm that neither loose blob is a delta' '
    + 	test_cmp expect actual
    + '
    + 
    ++test_expect_success 'setup delta base tests' '
    ++	foo="$(git rev-parse HEAD:foo)" &&
    ++	foo_plus="$(git rev-parse HEAD:foo-plus)" &&
    ++	git repack -ad
    ++'
    ++
    + # To avoid relying too much on the current delta heuristics,
    + # we will check only that one of the two objects is a delta
    + # against the other, but not the order. We can do so by just
    + # asking for the base of both, and checking whether either
    + # oid appears in the output.
    + test_expect_success '%(deltabase) reports packed delta bases' '
    +-	git repack -ad &&
    + 	git cat-file --batch-check="%(deltabase)" <blobs >actual &&
    + 	{
    +-		grep "$(git rev-parse HEAD:foo)" actual ||
    +-		grep "$(git rev-parse HEAD:foo-plus)" actual
    ++		grep "$foo" actual || grep "$foo_plus" actual
    + 	}
    + '
    + 
    ++test_expect_success 'delta base direct cache use succeeds w/o asserting' '
    ++	commands="info $foo
    ++info $foo_plus
    ++contents $foo_plus
    ++contents $foo" &&
    ++	echo "$commands" >in &&
    ++	git cat-file --batch-command <in >out
    ++'
    ++
    + test_expect_success 'setup bogus data' '
    + 	bogus_short_type="bogus" &&
    + 	bogus_short_content="bogus" &&
 8:  271f6241bd !  8:  ef83e8b426 cat-file: batch-command uses content_limit
    @@ Commit message
         Signed-off-by: Eric Wong <e@xxxxxxxxx>

      ## builtin/cat-file.c ##
    +@@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, struct expand_data *d
    + 		case OI_DBCACHED:
    + 			unlock_delta_base_cache();
    + 		}
    +-	} else if (data->type == OBJ_BLOB) {
    ++	} else {
    ++		assert(data->type == OBJ_BLOB);
    + 		if (opt->buffer_output)
    + 			fflush(stdout);
    + 		if (opt->transform_mode) {
    +@@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, struct expand_data *d
    + 			stream_blob(oid);
    + 		}
    + 	}
    +-	else {
    +-		enum object_type type;
    +-		unsigned long size;
    +-		void *contents;
    +-
    +-		contents = repo_read_object_file(the_repository, oid, &type,
    +-						 &size);
    +-		if (!contents)
    +-			die("object %s disappeared", oid_to_hex(oid));
    +-
    +-		if (use_mailmap) {
    +-			size_t s = size;
    +-			contents = replace_idents_using_mailmap(contents, &s);
    +-			size = cast_size_t_to_ulong(s);
    +-		}
    +-
    +-		if (type != data->type)
    +-			die("object %s changed type!?", oid_to_hex(oid));
    +-		if (data->info.sizep && size != data->size && !use_mailmap)
    +-			die("object %s changed size!?", oid_to_hex(oid));
    +-
    +-		batch_write(opt, contents, size);
    +-		free(contents);
    +-	}
    + }
    + 
    + static void print_default_format(struct strbuf *scratch, struct expand_data *data,
     @@ builtin/cat-file.c: static void parse_cmd_contents(struct batch_options *opt,
      			     struct expand_data *data)
      {
    @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
      		data.info.typep = &data.type;
      		if (!opt->transform_mode) {
      			data.info.sizep = &data.size;
    -
    - ## t/t1006-cat-file.sh ##
    -@@ t/t1006-cat-file.sh: test_expect_success 'confirm that neither loose blob is a delta' '
    - 	test_cmp expect actual
    - '
    - 
    -+test_expect_success 'setup delta base tests' '
    -+	foo="$(git rev-parse HEAD:foo)" &&
    -+	foo_plus="$(git rev-parse HEAD:foo-plus)" &&
    -+	git repack -ad
    -+'
    -+
    - # To avoid relying too much on the current delta heuristics,
    - # we will check only that one of the two objects is a delta
    - # against the other, but not the order. We can do so by just
    - # asking for the base of both, and checking whether either
    - # oid appears in the output.
    - test_expect_success '%(deltabase) reports packed delta bases' '
    --	git repack -ad &&
    - 	git cat-file --batch-check="%(deltabase)" <blobs >actual &&
    - 	{
    --		grep "$(git rev-parse HEAD:foo)" actual ||
    --		grep "$(git rev-parse HEAD:foo-plus)" actual
    -+		grep "$foo" actual || grep "$foo_plus" actual
    - 	}
    - '
    - 
    -+test_expect_success 'delta base direct cache use succeeds w/o asserting' '
    -+	commands="info $foo
    -+info $foo_plus
    -+contents $foo_plus
    -+contents $foo" &&
    -+	echo "$commands" >in &&
    -+	git cat-file --batch-command <in >out
    -+'
    -+
    - test_expect_success 'setup bogus data' '
    - 	bogus_short_type="bogus" &&
    - 	bogus_short_content="bogus" &&
 9:  d91030b69c =  9:  6a94452e54 cat-file: batch_write: use size_t for length
10:  c356b9e1ce ! 10:  1442e43ec7 cat-file: use writev(2) if available
    @@ Metadata
      ## Commit message ##
         cat-file: use writev(2) if available

    -    Using writev here is can be 20-40% faster than three write
    -    syscalls in succession for smaller (1-10k) objects in the delta
    -    base cache.  This advantage decreases as object sizes approach
    -    pipe size (64k on Linux).  This reduces wakeups and syscalls on
    -    the read side, as well, especially if the reader is relying on
    -    non-blocking I/O.
    +    Using writev here is 20-40% faster than three write syscalls in
    +    succession for smaller (1-10k) objects in the delta base cache.
    +    This advantage decreases as object sizes approach pipe size (64k
    +    on Linux).
    +
    +    writev reduces wakeups and syscalls on the read side as well:
    +    each write(2) syscall may trigger one or more corresponding
    +    read(2) syscalls in the reader.  Attempting atomicity in the
    +    writer via writev also reduces the likelyhood of non-blocking
    +    readers failing with EAGAIN and having to call poll||select
    +    before attempting to read again.

         Unfortunately, this turns into a small (1-3%) slowdown for
         gigantic objects of a megabyte or more even with after
    @@ Commit message
         defaults to being compatible with non-blocking stdout and able
         to poll(2) after hitting EAGAIN on write(2).  Using stdio on
         files with the O_NONBLOCK flag is (AFAIK) unspecified and likely
    -    subject to portability problems.
    +    subject to portability problems and thus avoided.

         Signed-off-by: Eric Wong <e@xxxxxxxxx>

    @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, s
     -		batch_write(opt, content, size);
     +		batch_writev(opt, data, hdr, size);
      		switch (data->info.whence) {
    - 		case OI_CACHED: BUG("FIXME OI_CACHED support not done");
    - 		case OI_LOOSE:
    + 		case OI_CACHED:
    + 			/*
     @@ builtin/cat-file.c: static void print_object_or_die(struct batch_options *opt, struct expand_data *d
      		}
      	} else {
    @@ builtin/cat-file.c: static int batch_objects(struct batch_options *opt)
     -			data.info.contentp = &data.content;
     +			data.info.contentp = &data.iov[1].iov_base;
      			data.info.content_limit = big_file_threshold;
    - 			data.info.direct_cache = USE_DIRECT_CACHE;
    + 			data.info.direct_cache = 1;
      		}

      ## config.mak.uname ##

base-commit: a7dae3bdc8b516d36f630b12bb01e853a667e0d9