Re: [PATCH 2/2] cache-tree: prefetch in partial clone read-tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 23, 2021 at 11:55 AM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:
>
> "git read-tree" checks the existence of the blobs referenced by the
> given tree, but does not bulk prefetch them. Add a bulk prefetch.
>
> The lack of prefetch here was noticed at $DAYJOB during a merge
> involving some specific commits, but I couldn't find a minimal merge
> that didn't also trigger the prefetch in check_updates() in
> unpack-trees.c (and in all these cases, the lack of prefetch in
> cache-tree.c didn't matter because all the relevant blobs would have
> already been prefetched by then). This is why I used read-tree here to
> exercise this code path.

Okay, you have me stumped, I can't figure out what kind of a merge
would bypass the check_updates() in unpack-trees.c either.  I was
curious about octopus or merge.autostash, but I just can't trigger it.

Using read-tree to trigger the case makes perfect sense, though.

> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> ---
>  cache-tree.c                       | 11 ++++++++--
>  t/t1022-read-tree-partial-clone.sh | 33 ++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 2 deletions(-)
>  create mode 100755 t/t1022-read-tree-partial-clone.sh
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 45e58666af..9ba2c7c6b2 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -237,6 +237,11 @@ int cache_tree_fully_valid(struct cache_tree *it)
>         return 1;
>  }
>
> +static int must_check_existence(const struct cache_entry *ce)
> +{
> +       return !(has_promisor_remote() && ce_skip_worktree(ce));
> +}
> +
>  static int update_one(struct cache_tree *it,
>                       struct cache_entry **cache,
>                       int entries,
> @@ -378,8 +383,7 @@ static int update_one(struct cache_tree *it,
>                 }
>
>                 ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
> -                       (has_promisor_remote() &&
> -                        ce_skip_worktree(ce));
> +                       !must_check_existence(ce);
>                 if (is_null_oid(oid) ||
>                     (!ce_missing_ok && !has_object_file(oid))) {
>                         strbuf_release(&buffer);
> @@ -466,6 +470,9 @@ int cache_tree_update(struct index_state *istate, int flags)
>         if (!istate->cache_tree)
>                 istate->cache_tree = cache_tree();
>
> +       if (!(flags & WRITE_TREE_MISSING_OK) && has_promisor_remote())
> +               prefetch_cache_entries(istate, must_check_existence);
> +

Nice that the fix is so simple.

>         trace_performance_enter();
>         trace2_region_enter("cache_tree", "update", the_repository);
>         i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
> diff --git a/t/t1022-read-tree-partial-clone.sh b/t/t1022-read-tree-partial-clone.sh
> new file mode 100755
> index 0000000000..a763e27c7d
> --- /dev/null
> +++ b/t/t1022-read-tree-partial-clone.sh
> @@ -0,0 +1,33 @@
> +#!/bin/sh
> +
> +test_description='git read-tree in partial clones'
> +
> +TEST_NO_CREATE_REPO=1
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'read-tree in partial clone prefetches in one batch' '
> +       test_when_finished "rm -rf server client trace" &&
> +
> +       git init server &&
> +       echo foo >server/one &&
> +       echo bar >server/two &&
> +       git -C server add one two &&
> +       git -C server commit -m "initial commit" &&
> +       TREE=$(git -C server rev-parse HEAD^{tree}) &&
> +
> +       git -C server config uploadpack.allowfilter 1 &&
> +       git -C server config uploadpack.allowanysha1inwant 1 &&
> +       git clone --bare --filter=blob:none "file://$(pwd)/server" client &&
> +       GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE &&
> +
> +       # "done" marks the end of negotiation (once per fetch). Expect that
> +       # only one fetch occurs.
> +       grep "fetch> done" trace >donelines &&
> +       test_line_count = 1 donelines
> +'
> +
> +test_done
> --
> 2.32.0.432.gabb21c7263-goog

Any reason for preferring GIT_TRACE_PACKET over GIT_TRACE2_PERF and
looking for the reported fetch_count (or even the number of
fetch_count lines)?  Just curious.

Anyway, looks good to me.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux