[PATCH v3 0/3] Maintenance: adapt custom refspecs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Saeger rightly pointed out [1] that the prefetch task ignores custom
refspecs. This can lead to downloading more data than requested, and it
doesn't even help the future foreground fetches that use that custom
refspec.

[1]
https://lore.kernel.org/git/20210401184914.qmr7jhjbhp2mt3h6@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

This series fixes this problem by carefully replacing the start of each
refspec's destination with "refs/prefetch/". If the destination already
starts with "refs/", then that is replaced. Otherwise "refs/prefetch/" is
just prepended.

This happens inside of git fetch when a --prefetch option is given. This
allows us to maniuplate a struct refspec_item instead of a full refspec
string. It also simplifies our logic in testing the prefetch task.


Update in V3
============

 * The fix is almost completely rewritten as an update to 'git fetch'. See
   the new PATCH 2 for this update.

 * There was some discussion of rewriting test_subcommand, but that can be
   delayed until a proper solution is found to complications around softer
   matches.


Updates in V2
=============

Thanks for the close eye on this series. I appreciate the recommendations,
which I believe I have responded to them all:

 * Fixed typos.
 * Made refspec_item_format() re-entrant. Consumers must free the buffer.
 * Cleaned up style (quoting and tabbing).

Thanks, -Stolee

Derrick Stolee (3):
  maintenance: simplify prefetch logic
  fetch: add --prefetch option
  maintenance: use 'git fetch --prefetch'

 Documentation/fetch-options.txt   |  5 +++
 Documentation/git-maintenance.txt |  6 ++--
 builtin/fetch.c                   | 56 +++++++++++++++++++++++++++++++
 builtin/gc.c                      | 36 +++++---------------
 t/t5582-fetch-negative-refspec.sh | 30 +++++++++++++++++
 t/t7900-maintenance.sh            | 14 ++++----
 6 files changed, 109 insertions(+), 38 deletions(-)


base-commit: 89b43f80a514aee58b662ad606e6352e03eaeee4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-924%2Fderrickstolee%2Fmaintenance%2Frefspec-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-924/derrickstolee/maintenance/refspec-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/924

Range-diff vs v2:

 1:  5aa0cb06c3f2 = 1:  4c0e983ba56f maintenance: simplify prefetch logic
 2:  d58a3e042ee8 < -:  ------------ test-lib: use exact match for test_subcommand
 3:  96388d949b98 < -:  ------------ refspec: output a refspec item
 4:  bf296282323a < -:  ------------ test-tool: test refspec input/output
 -:  ------------ > 2:  7f488eea6dbd fetch: add --prefetch option
 5:  9592224e3d42 ! 3:  ed055d772452 maintenance: allow custom refspecs during prefetch
     @@ Metadata
      Author: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
      
       ## Commit message ##
     -    maintenance: allow custom refspecs during prefetch
     +    maintenance: use 'git fetch --prefetch'
      
     -    The prefetch task previously used the default refspec source plus a
     -    custom refspec destination to avoid colliding with remote refs:
     +    The 'prefetch' maintenance task previously forced the following refspec
     +    for each remote:
      
                  +refs/heads/*:refs/prefetch/<remote>/*
      
     -    However, some users customize their refspec to reduce how much data they
     -    download from specific remotes. This can involve restrictive patterns
     -    for fetching or negative patterns to avoid downloading some refs.
     +    If a user has specified a more strict refspec for the remote, then this
     +    prefetch task downloads more objects than necessary.
      
     -    Modify fetch_remote() to iterate over the remote's refspec list and
     -    translate that into the appropriate prefetch scenario. Specifically,
     -    re-parse the raw form of the refspec into a new 'struct refspec' and
     -    modify the 'dst' member to replace a leading "refs/" substring with
     -    "refs/prefetch/", or prepend "refs/prefetch/" to 'dst' otherwise.
     -    Negative refspecs do not have a 'dst' so they can be transferred to the
     -    'git fetch' command unmodified.
     -
     -    This prefix change provides the benefit of keeping whatever collisions
     -    may exist in the custom refspecs, if that is a desirable outcome.
     -
     -    This changes the names of the refs that would be fetched by the default
     -    refspec. Instead of "refs/prefetch/<remote>/<branch>" they will now go
     -    to "refs/prefetch/remotes/<remote>/<branch>". While this is a change, it
     -    is not a seriously breaking one: these refs are intended to be hidden
     -    and not used.
     +    The previous change introduced the '--prefetch' option to 'git fetch'
     +    which manipulates the remote's refspec to place all resulting refs into
     +    refs/prefetch/, with further partitioning based on the destinations of
     +    those refspecs.
      
          Update the documentation to be more generic about the destination refs.
     -    Do not mention custom refpecs explicitly, as that does not need to be
     +    Do not mention custom refspecs explicitly, as that does not need to be
          highlighted in this documentation. The important part of placing refs in
     -    refs/prefetch remains.
     +    refs/prefetch/ remains.
      
          Reported-by: Tom Saeger <tom.saeger@xxxxxxxxxx>
          Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
      
       ## Documentation/git-maintenance.txt ##
     -@@ Documentation/git-maintenance.txt: prefetch::
     +@@ Documentation/git-maintenance.txt: commit-graph::
     + prefetch::
     + 	The `prefetch` task updates the object directory with the latest
       	objects from all registered remotes. For each remote, a `git fetch`
     - 	command is run. The refmap is custom to avoid updating local or remote
     - 	branches (those in `refs/heads` or `refs/remotes`). Instead, the
     +-	command is run. The refmap is custom to avoid updating local or remote
     +-	branches (those in `refs/heads` or `refs/remotes`). Instead, the
      -	remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are
      -	not updated.
     -+	refs are stored in `refs/prefetch/`. Also, tags are not updated.
     ++	command is run. The configured refspec is modified to place all
     ++	requested refs within `refs/prefetch/`. Also, tags are not updated.
       +
       This is done to avoid disrupting the remote-tracking branches. The end users
       expect these refs to stay unmoved unless they initiate a fetch.  With prefetch
      
       ## builtin/gc.c ##
     -@@
     - #include "remote.h"
     - #include "object-store.h"
     - #include "exec-cmd.h"
     -+#include "refspec.h"
     - 
     - #define FAILED_RUN "failed to run %s"
     - 
      @@ builtin/gc.c: static int fetch_remote(struct remote *remote, void *cbdata)
     - {
     - 	struct maintenance_run_opts *opts = cbdata;
       	struct child_process child = CHILD_PROCESS_INIT;
     -+	int i;
       
       	child.git_cmd = 1;
     - 	strvec_pushl(&child.args, "fetch", remote->name, "--prune", "--no-tags",
     -@@ builtin/gc.c: static int fetch_remote(struct remote *remote, void *cbdata)
     +-	strvec_pushl(&child.args, "fetch", remote->name, "--prune", "--no-tags",
     ++	strvec_pushl(&child.args, "fetch", remote->name,
     ++		     "--prefetch", "--prune", "--no-tags",
     + 		     "--no-write-fetch-head", "--recurse-submodules=no",
     +-		     "--refmap=", NULL);
     ++		     NULL);
     + 
       	if (opts->quiet)
       		strvec_push(&child.args, "--quiet");
       
      -	strvec_pushf(&child.args, "+refs/heads/*:refs/prefetch/%s/*", remote->name);
     -+	for (i = 0; i < remote->fetch.nr; i++) {
     -+		struct refspec_item replace;
     -+		struct refspec_item *rsi = &remote->fetch.items[i];
     -+		struct strbuf new_dst = STRBUF_INIT;
     -+		size_t ignore_len = 0;
     -+		char *replace_string;
     -+
     -+		if (rsi->negative) {
     -+			strvec_push(&child.args, remote->fetch.raw[i]);
     -+			continue;
     -+		}
     -+
     -+		refspec_item_init(&replace, remote->fetch.raw[i], 1);
     -+
     -+		/*
     -+		 * If a refspec dst starts with "refs/" at the start,
     -+		 * then we will replace "refs/" with "refs/prefetch/".
     -+		 * Otherwise, we will prepend the dst string with
     -+		 * "refs/prefetch/".
     -+		 */
     -+		if (!strncmp(replace.dst, "refs/", 5))
     -+			ignore_len = 5;
     -+
     -+		strbuf_addstr(&new_dst, "refs/prefetch/");
     -+		strbuf_addstr(&new_dst, replace.dst + ignore_len);
     -+		free(replace.dst);
     -+		replace.dst = strbuf_detach(&new_dst, NULL);
     -+
     -+		replace_string = refspec_item_format(&replace);
     -+		strvec_push(&child.args, replace_string);
     -+		free(replace_string);
     -+
     -+		refspec_item_clear(&replace);
     -+	}
     - 
     +-
       	return !!run_command(&child);
       }
     + 
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'prefetch multiple remotes' '
     + 	test_commit -C clone1 one &&
       	test_commit -C clone2 two &&
       	GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
     - 	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
     --	test_subcommand git fetch remote1 $fetchargs "+refs/heads/*:refs/prefetch/remote1/*" <run-prefetch.txt &&
     --	test_subcommand git fetch remote2 $fetchargs "+refs/heads/*:refs/prefetch/remote2/*" <run-prefetch.txt &&
     -+	test_subcommand git fetch remote1 $fetchargs "+refs/heads/*:refs/prefetch/remotes/remote1/*" <run-prefetch.txt &&
     -+	test_subcommand git fetch remote2 $fetchargs "+refs/heads/*:refs/prefetch/remotes/remote2/*" <run-prefetch.txt &&
     +-	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
     +-	test_subcommand git fetch remote1 $fetchargs +refs/heads/\\*:refs/prefetch/remote1/\\* <run-prefetch.txt &&
     +-	test_subcommand git fetch remote2 $fetchargs +refs/heads/\\*:refs/prefetch/remote2/\\* <run-prefetch.txt &&
     ++	fetchargs="--prefetch --prune --no-tags --no-write-fetch-head --recurse-submodules=no --quiet" &&
     ++	test_subcommand git fetch remote1 $fetchargs <run-prefetch.txt &&
     ++	test_subcommand git fetch remote2 $fetchargs <run-prefetch.txt &&
       	test_path_is_missing .git/refs/remotes &&
      -	git log prefetch/remote1/one &&
      -	git log prefetch/remote2/two &&
     @@ t/t7900-maintenance.sh: test_expect_success 'prefetch multiple remotes' '
       
       	test_cmp_config refs/prefetch/ log.excludedecoration &&
       	git log --oneline --decorate --all >log &&
     - 	! grep "prefetch" log
     - '
     - 
     -+test_expect_success 'prefetch custom refspecs' '
     -+	git -C clone1 branch -f special/fetched HEAD &&
     -+	git -C clone1 branch -f special/secret/not-fetched HEAD &&
     -+
     -+	# create multiple refspecs for remote1
     -+	git config --add remote.remote1.fetch "+refs/heads/special/fetched:refs/heads/fetched" &&
     -+	git config --add remote.remote1.fetch "^refs/heads/special/secret/not-fetched" &&
     -+
     -+	GIT_TRACE2_EVENT="$(pwd)/prefetch-refspec.txt" git maintenance run --task=prefetch 2>/dev/null &&
     -+
     -+	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
     -+
     -+	# skips second refspec because it is not a pattern type
     -+	rs1="+refs/heads/*:refs/prefetch/remotes/remote1/*" &&
     -+	rs2="+refs/heads/special/fetched:refs/prefetch/heads/fetched" &&
     -+	rs3="^refs/heads/special/secret/not-fetched" &&
     -+
     -+	test_subcommand git fetch remote1 $fetchargs "$rs1" "$rs2" "$rs3" <prefetch-refspec.txt &&
     -+	test_subcommand git fetch remote2 $fetchargs "+refs/heads/*:refs/prefetch/remotes/remote2/*" <prefetch-refspec.txt &&
     -+
     -+	# first refspec is overridden by second
     -+	test_must_fail git rev-parse refs/prefetch/special/fetched &&
     -+	git rev-parse refs/prefetch/heads/fetched &&
     -+
     -+	# possible incorrect places for the non-fetched ref
     -+	test_must_fail git rev-parse refs/prefetch/remotes/remote1/secret/not-fetched &&
     -+	test_must_fail git rev-parse refs/prefetch/remotes/remote1/not-fetched &&
     -+	test_must_fail git rev-parse refs/heads/secret/not-fetched &&
     -+	test_must_fail git rev-parse refs/heads/not-fetched
     -+'
     -+
     - test_expect_success 'prefetch and existing log.excludeDecoration values' '
     - 	git config --unset-all log.excludeDecoration &&
     - 	git config log.excludeDecoration refs/remotes/remote1/ &&

-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux