Re: [PATCH v6 0/8] Sparse Index: integrate with reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Mon, Nov 29, 2021 at 7:52 AM Victoria Dye via GitGitGadget
<gitgitgadget@xxxxxxxxx> wrote:
>
> Changes since V5
> ================
>
>  * Update t1092 test 'reset with wildcard pathspec' with more cases and
>    better checks
>  * Add "special case" wildcard pathspec check when determining whether to
>    expand the index (avoids double-loop over pathspecs & index entries)

Looks pretty good.  However, I'm worried this special case you added
at my prodding might be problematic, and that I may have been wrong to
prod you into it...

> Thanks! -Victoria
>
> Range-diff vs v5:
>
>  7:  a9135a5ed64 ! 7:  822d7344587 reset: make --mixed sparse-aware
>      @@ Commit message
>
>           Remove the `ensure_full_index` guard on `read_from_tree` and update `git
>           reset --mixed` to ensure it can use sparse directory index entries wherever
>      -    possible. Sparse directory entries are reset use `diff_tree_oid`, which
>      +    possible. Sparse directory entries are reset using `diff_tree_oid`, which
>           requires `change` and `add_remove` functions to process the internal
>           contents of the sparse directory. The `recursive` diff option handles cases
>           in which `reset --mixed` must diff/merge files that are nested multiple
>      @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q,
>       +          * (since we can reset whole sparse directories without expanding them).
>       +          */
>       +         if (item.nowildcard_len < item.len) {
>      ++                 /*
>      ++                  * Special case: if the pattern is a path inside the cone
>      ++                  * followed by only wildcards, the pattern cannot match
>      ++                  * partial sparse directories, so we don't expand the index.
>      ++                  */
>      ++                 if (path_in_cone_mode_sparse_checkout(item.original, &the_index) &&
>      ++                     strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len)

I usually expect in an &&-chain to see the cheaper function call first
(because that ordering often avoids the need to call the second
function), and I would presume that strspn() would be the cheaper of
the two.  Did you switch the order because you expect the strspn call
to nearly always return true, though?

Could the strspn() call be replaced by a `item.len ==
item.nowildcard_len + 1`?  I mean, sure, folks could list multiple
asterisks in a row in their pathspec, but that seems super unlikely
and even if it does happen the code will just fall back to the slower
codepath and still give them the right answer.  And the simpler check
feels a lot easier to parse for human readers.

But I'm worried there's a deeper issue here:


Is the wildcard character (or characters) in path treated as a literal
by path_in_cone_mode_sparse_checkout()?  I think it is...and I'm
worried that may be incorrect.  For example, if the path is

   foo/*

and the user has done a

  git sparse-checkout set foo/bar/

Then 'foo/baz/file' is not in the sparse checkout.  However, 'foo/*'
should match 'foo/baz/file' and yet 'foo/*' when treated as a literal
path would be considered in the sparse checkout by
path_in_cone_mode_sparse_checkout.  Does this result in the code
returning an incorrect answer?  (Or did I misunderstand something so
far?)

I'm wondering if I misled you earlier in my musings about whether we
could avoid the slow codepath for pathspecs with wildcard characters.
Maybe there's no safe optimization here and wildcard characters should
always go through the slower codepath.

>      ++                         continue;
>      ++
>       +                 for (pos = 0; pos < active_nr; pos++) {
>       +                         struct cache_entry *ce = active_cache[pos];
>       +
>  8:  f91d1dcf024 = 8:  ddd97fb2837 unpack-trees: improve performance of next_cache_entry
>
> --
> gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux