[PATCH v5 00/16] refs: implement jump lists for packed backend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here is another reroll of my series to implement jump (née skip) lists
for the packed refs backend, based on top of the current 'master'.

This responds to a review of the entire series that Peff gave a couple
of weeks ago[1], which should satisfy all of the comments that he
raised, most of which were relatively minor.

The notable changes from last time are:

  - clarifying some commit messages, notably to explicitly state that
    certain clean-ups do not indicate existing bugs, but are defensive
    against potential future ones
  - polishing up some of the API changes, to avoid having
    `match_name_as_path()` take a ref_filter pointer.
  - more timing data and documentation in the first substantive patch
  - clean-up in the upload-pack.c to make the changes clearer

As usual, a range-diff is included below for convenience.

Thanks in advance for your review!

[1]: https://lore.kernel.org/git/20230703062925.GK3502534@xxxxxxxxxxxxxxxxxxxxxxx/

Jeff King (5):
  refs.c: rename `ref_filter`
  ref-filter.h: provide `REF_FILTER_INIT`
  ref-filter: clear reachable list pointers after freeing
  ref-filter: add `ref_filter_clear()`
  ref-filter.c: parameterize match functions over patterns

Taylor Blau (11):
  builtin/for-each-ref.c: add `--exclude` option
  refs: plumb `exclude_patterns` argument throughout
  refs/packed-backend.c: refactor `find_reference_location()`
  refs/packed-backend.c: implement jump lists to avoid excluded
    pattern(s)
  refs/packed-backend.c: add trace2 counters for jump list
  revision.h: store hidden refs in a `strvec`
  refs.h: let `for_each_namespaced_ref()` take excluded patterns
  refs.h: implement `hidden_refs_to_excludes()`
  builtin/receive-pack.c: avoid enumerating hidden references
  upload-pack.c: avoid enumerating hidden refs where possible
  ls-refs.c: avoid enumerating hidden refs where possible

 Documentation/git-for-each-ref.txt |   6 +
 builtin/branch.c                   |   4 +-
 builtin/for-each-ref.c             |   7 +-
 builtin/receive-pack.c             |   8 +-
 builtin/tag.c                      |   4 +-
 http-backend.c                     |   2 +-
 ls-refs.c                          |   7 +-
 ref-filter.c                       |  66 +++++++---
 ref-filter.h                       |  12 ++
 refs.c                             |  85 ++++++++----
 refs.h                             |  29 +++-
 refs/debug.c                       |   5 +-
 refs/files-backend.c               |   5 +-
 refs/packed-backend.c              | 204 +++++++++++++++++++++++++----
 refs/refs-internal.h               |   7 +-
 revision.c                         |   4 +-
 revision.h                         |   5 +-
 t/helper/test-reach.c              |   2 +-
 t/helper/test-ref-store.c          |  10 ++
 t/t0041-usage.sh                   |   1 +
 t/t1419-exclude-refs.sh            | 122 +++++++++++++++++
 t/t3402-rebase-merge.sh            |   1 +
 t/t6300-for-each-ref.sh            |  35 +++++
 trace2.h                           |   2 +
 trace2/tr2_ctr.c                   |   5 +
 upload-pack.c                      |  47 +++++--
 26 files changed, 579 insertions(+), 106 deletions(-)
 create mode 100755 t/t1419-exclude-refs.sh

Range-diff against v4:
 1:  c12def5a30a !  1:  5b5ccc40d6b refs.c: rename `ref_filter`
    @@ Commit message
         refs.c: rename `ref_filter`
     
         The refs machinery has its own implementation of a `ref_filter` (used by
    -    `for-each-ref`), which is distinct from the `ref-filler.h` API (also
    +    `for-each-ref`), which is distinct from the `ref-filter.h` API (also
         used by `for-each-ref`, among other things).
     
         Rename the one within refs.c to more clearly indicate its purpose.
 2:  7ce82b6a5a4 !  2:  0c4e995f1d3 ref-filter.h: provide `REF_FILTER_INIT`
    @@ Commit message
         Provide a sane initialization value for `struct ref_filter`, which in a
         subsequent patch will be used to initialize a new field.
     
    -    In the meantime, fix a case in test-reach.c where its `ref_filter` is
    -    not even zero-initialized.
    +    In the meantime, ensure that the `ref_filter` struct used in the
    +    test-helper's `cmd__reach()` is zero-initialized. The lack of
    +    initialization is OK, since `commit_contains()` only looks at the single
    +    `with_commit_tag_algo` field that *is* initialized directly above.
    +
    +    So this does not fix a bug, but rather prevents one from biting us in
    +    the future.
     
         Signed-off-by: Jeff King <peff@xxxxxxxx>
         Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
 3:  7e6bf7766d0 !  3:  cea1e88d358 ref-filter: clear reachable list pointers after freeing
    @@ Metadata
      ## Commit message ##
         ref-filter: clear reachable list pointers after freeing
     
    -    In reach_filter(), we pop all commits from the reachable lists, leaving
    -    them empty. But because we're operating on a list pointer that was
    -    passed by value, the original filter.reachable_from pointer is left
    -    dangling.
    +    In `reach_filter()`, we pop all commits from the reachable lists,
    +    leaving them empty. But because we're operating on a list pointer that
    +    was passed by value, the original `filter.reachable_from` and
    +    `filter.unreachable_from` pointers are left dangling.
    +
    +    As is the case with the previous commit, nobody touches either of these
    +    fields after calling `reach_filter()`, so leaving them dangling is OK.
    +    But this future proofs against dangerous situations.
     
         Signed-off-by: Jeff King <peff@xxxxxxxx>
         Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
 4:  777e71004d6 =  4:  00a4532d54d ref-filter: add `ref_filter_clear()`
 5:  39e9b0f50d0 !  5:  3ab03ac20df ref-filter.c: parameterize match functions over patterns
    @@ ref-filter.c: static int get_ref_atom_value(struct ref_array_item *ref, int atom
       */
     -static int match_pattern(const struct ref_filter *filter, const char *refname)
     +static int match_pattern(const char **patterns, const char *refname,
    -+			 const int ignore_case)
    ++			 int ignore_case)
      {
     -	const char **patterns = filter->name_patterns;
      	unsigned flags = 0;
    @@ ref-filter.c: static int match_pattern(const struct ref_filter *filter, const ch
       * wildcard (e.g. the same ref matches "refs/heads/m*", too).
       */
     -static int match_name_as_path(const struct ref_filter *filter, const char *refname)
    -+static int match_name_as_path(const struct ref_filter *filter,
    -+			      const char **pattern,
    -+			      const char *refname)
    ++static int match_name_as_path(const char **pattern, const char *refname,
    ++			      int ignore_case)
      {
     -	const char **pattern = filter->name_patterns;
      	int namelen = strlen(refname);
      	unsigned flags = WM_PATHNAME;
      
    +-	if (filter->ignore_case)
    ++	if (ignore_case)
    + 		flags |= WM_CASEFOLD;
    + 
    + 	for (; *pattern; pattern++) {
     @@ ref-filter.c: static int filter_pattern_match(struct ref_filter *filter, const char *refname)
      	if (!*filter->name_patterns)
      		return 1; /* No pattern always matches */
      	if (filter->match_as_path)
     -		return match_name_as_path(filter, refname);
     -	return match_pattern(filter, refname);
    -+		return match_name_as_path(filter, filter->name_patterns, refname);
    ++		return match_name_as_path(filter->name_patterns, refname,
    ++					  filter->ignore_case);
     +	return match_pattern(filter->name_patterns, refname,
     +			     filter->ignore_case);
      }
 6:  c4fd47fd750 !  6:  aa881ca06fa builtin/for-each-ref.c: add `--exclude` option
    @@ ref-filter.c: static int filter_pattern_match(struct ref_filter *filter, const c
     +	if (!filter->exclude.nr)
     +		return 0;
     +	if (filter->match_as_path)
    -+		return match_name_as_path(filter, filter->exclude.v, refname);
    ++		return match_name_as_path(filter->exclude.v, refname,
    ++					  filter->ignore_case);
     +	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
     +}
     +
 7:  e6b50c50219 =  7:  81e223de0c8 refs: plumb `exclude_patterns` argument throughout
 8:  a0990b2916c =  8:  25c099a528c refs/packed-backend.c: refactor `find_reference_location()`
 9:  386ed468fa7 !  9:  af0ce43cc90 refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
    @@ Commit message
     
             $ hyperfine \
               'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
    +          'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' \
               'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
             Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
    -          Time (mean ± σ):     802.7 ms ±   2.1 ms    [User: 691.6 ms, System: 147.0 ms]
    -          Range (min … max):   800.0 ms … 807.7 ms    10 runs
    +          Time (mean ± σ):     798.1 ms ±   3.3 ms    [User: 687.6 ms, System: 146.4 ms]
    +          Range (min … max):   794.5 ms … 805.5 ms    10 runs
     
    -        Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
    -          Time (mean ± σ):       4.7 ms ±   0.3 ms    [User: 0.7 ms, System: 4.0 ms]
    -          Range (min … max):     4.3 ms …   6.7 ms    422 runs
    +        Benchmark 2: git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
    +          Time (mean ± σ):      98.9 ms ±   1.4 ms    [User: 93.1 ms, System: 5.7 ms]
    +          Range (min … max):    97.0 ms … 104.0 ms    29 runs
    +
    +        Benchmark 3: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
    +          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 0.7 ms, System: 3.8 ms]
    +          Range (min … max):     4.1 ms …   5.8 ms    524 runs
     
             Summary
               'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
    -          172.03 ± 9.60 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'
    +           21.87 ± 1.05 times faster than 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
    +          176.52 ± 8.19 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'
    +
    +    (Comparing stock git and this patch isn't quite fair, since an earlier
    +    commit in this series adds a naive implementation of the `--exclude`
    +    option. `git.prev` is built from the previous commit and includes this
    +    naive implementation).
     
         Using the jump list is fairly straightforward (see the changes to
         `refs/packed-backend.c::next_record()`), but constructing the list is
    @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
      
      /*
     
    + ## refs.h ##
    +@@ refs.h: int for_each_ref(each_ref_fn fn, void *cb_data);
    +  */
    + int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data);
    + 
    ++/*
    ++ * references matching any pattern in "exclude_patterns" are omitted from the
    ++ * result set on a best-effort basis.
    ++ */
    + int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
    + 			     const char **exclude_patterns,
    + 			     each_ref_fn fn, void *cb_data);
    +
      ## refs/packed-backend.c ##
     @@ refs/packed-backend.c: static int cmp_packed_ref_records(const void *v1, const void *v2)
       * Compare a snapshot record at `rec` to the specified NUL-terminated
    @@ refs/packed-backend.c: struct packed_ref_iterator {
     +		iter->jump_cur++;
     +		if (iter->pos < curr->end) {
     +			iter->pos = curr->end;
    ++			/* jumps are coalesced, so only one jump is necessary */
     +			break;
     +		}
     +	}
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +
     +	for (pattern = excluded_patterns; *pattern; pattern++) {
     +		struct jump_list_entry *e;
    ++		const char *start, *end;
     +
     +		/*
     +		 * We can't feed any excludes with globs in them to the
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +		if (has_glob_special(*pattern))
     +			continue;
     +
    ++		start = find_reference_location(snapshot, *pattern, 0);
    ++		end = find_reference_location_end(snapshot, *pattern, 0);
    ++
    ++		if (start == end)
    ++			continue; /* nothing to jump over */
    ++
     +		ALLOC_GROW(iter->jump, iter->jump_nr + 1, iter->jump_alloc);
     +
     +		e = &iter->jump[iter->jump_nr++];
    -+		e->start = find_reference_location(snapshot, *pattern, 0);
    -+		e->end = find_reference_location_end(snapshot, *pattern, 0);
    ++		e->start = start;
    ++		e->end = end;
     +	}
     +
     +	if (!iter->jump_nr) {
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +
     +	for (i = 1, j = 1; i < iter->jump_nr; i++) {
     +		struct jump_list_entry *ours = &iter->jump[i];
    -+
    -+		if (ours->start == ours->end) {
    -+			/* ignore empty regions (no matching entries) */
    -+			continue;
    -+		} else if (ours->start <= last_disjoint->end) {
    ++		if (ours->start <= last_disjoint->end) {
     +			/* overlapping regions extend the previous one */
     +			last_disjoint->end = last_disjoint->end > ours->end
     +				? last_disjoint->end : ours->end;
    @@ refs/packed-backend.c: static struct ref_iterator_vtable packed_ref_iterator_vta
     +			/* otherwise, insert a new region */
     +			iter->jump[j++] = *ours;
     +			last_disjoint = ours;
    -+
     +		}
     +	}
     +
10:  49c8f5173aa ! 10:  e150941c1d1 refs/packed-backend.c: add trace2 counters for jump list
    @@ Commit message
     
      ## refs/packed-backend.c ##
     @@
    - #include "../chdir-notify.h"
    + #include "../statinfo.h"
      #include "../wrapper.h"
      #include "../write-or-die.h"
     +#include "../trace2.h"
    @@ refs/packed-backend.c: static int next_record(struct packed_ref_iterator *iter)
      		if (iter->pos < curr->end) {
      			iter->pos = curr->end;
     +			trace2_counter_add(TRACE2_COUNTER_ID_PACKED_REFS_JUMPS, 1);
    + 			/* jumps are coalesced, so only one jump is necessary */
      			break;
      		}
    - 	}
     
      ## t/t1419-exclude-refs.sh ##
     @@ t/t1419-exclude-refs.sh: TEST_PASSES_SANITIZE_LEAK=true
11:  dd856a3982b = 11:  a5093d52008 revision.h: store hidden refs in a `strvec`
12:  845904853ee <  -:  ----------- refs/packed-backend.c: ignore complicated hidden refs rules
13:  8d4d7cc22ee ! 12:  7e72c23c8a4 refs.h: let `for_each_namespaced_ref()` take excluded patterns
    @@ refs.h: int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
      
      int head_ref_namespaced(each_ref_fn fn, void *cb_data);
     -int for_each_namespaced_ref(each_ref_fn fn, void *cb_data);
    ++/*
    ++ * references matching any pattern in "exclude_patterns" are omitted from the
    ++ * result set on a best-effort basis.
    ++ */
     +int for_each_namespaced_ref(const char **exclude_patterns,
     +			    each_ref_fn fn, void *cb_data);
      
 -:  ----------- > 13:  f99d89d53b9 refs.h: implement `hidden_refs_to_excludes()`
14:  49c665f9f8f ! 14:  96aada36f9b builtin/receive-pack.c: avoid enumerating hidden references
    @@ builtin/receive-pack.c: static void write_head_info(void)
      
     -	for_each_ref(show_ref_cb, &seen);
     +	refs_for_each_fullref_in(get_main_ref_store(the_repository), "",
    -+				 hidden_refs.v, show_ref_cb, &seen);
    ++				 hidden_refs_to_excludes(&hidden_refs),
    ++				 show_ref_cb, &seen);
      	for_each_alternate_ref(show_one_alternate_ref, &seen);
      	oidset_clear(&seen);
      	if (!sent_capabilities)
15:  19bf4a52d69 ! 15:  8544a647798 upload-pack.c: avoid enumerating hidden refs where possible
    @@ upload-pack.c: static int get_common_commits(struct upload_pack_data *data,
      
     +static int allow_hidden_refs(enum allow_uor allow_uor)
     +{
    -+	return allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1);
    ++	if ((allow_uor & ALLOW_ANY_SHA1) == ALLOW_ANY_SHA1)
    ++		return 1;
    ++	return !(allow_uor & (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
     +}
     +
     +static void for_each_namespaced_ref_1(each_ref_fn fn,
     +				      struct upload_pack_data *data)
     +{
    ++	const char **excludes = NULL;
     +	/*
     +	 * If `data->allow_uor` allows fetching hidden refs, we need to
     +	 * mark all references (including hidden ones), to check in
    @@ upload-pack.c: static int get_common_commits(struct upload_pack_data *data,
     +	 * hidden references.
     +	 */
     +	if (allow_hidden_refs(data->allow_uor))
    -+		for_each_namespaced_ref(NULL, fn, data);
    -+	else
    -+		for_each_namespaced_ref(data->hidden_refs.v, fn, data);
    ++		excludes = hidden_refs_to_excludes(&data->hidden_refs);
    ++
    ++	for_each_namespaced_ref(excludes, fn, data);
     +}
    ++
     +
      static int is_our_ref(struct object *o, enum allow_uor allow_uor)
      {
     -	int allow_hidden_ref = (allow_uor &
     -				(ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
     -	return o->flags & ((allow_hidden_ref ? HIDDEN_REF : 0) | OUR_REF);
    -+	return o->flags & ((allow_hidden_refs(allow_uor) ? HIDDEN_REF : 0) | OUR_REF);
    ++	return o->flags & ((allow_hidden_refs(allow_uor) ? 0 : HIDDEN_REF) | OUR_REF);
      }
      
      /*
16:  ea6cbaf292f ! 16:  dff068c469f ls-refs.c: avoid enumerating hidden refs where possible
    @@ ls-refs.c: int ls_refs(struct repository *r, struct packet_reader *request)
      	refs_for_each_fullref_in_prefixes(get_main_ref_store(r),
      					  get_git_namespace(), data.prefixes.v,
     -					  NULL, send_ref, &data);
    -+					  data.hidden_refs.v, send_ref, &data);
    ++					  hidden_refs_to_excludes(&data.hidden_refs),
    ++					  send_ref, &data);
      	packet_fflush(stdout);
      	strvec_clear(&data.prefixes);
      	strbuf_release(&data.buf);
-- 
2.41.0.343.gdff068c469f



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux