On Mon, Nov 29, 2021 at 11:44 AM Victoria Dye <vdye@xxxxxxxxxx> wrote: > > Elijah Newren wrote: > > Hi, > > > > On Mon, Nov 29, 2021 at 7:52 AM Victoria Dye via GitGitGadget > > <gitgitgadget@xxxxxxxxx> wrote: > >> > >> Changes since V5 > >> ================ > >> > >> * Update t1092 test 'reset with wildcard pathspec' with more cases and > >> better checks > >> * Add "special case" wildcard pathspec check when determining whether to > >> expand the index (avoids double-loop over pathspecs & index entries) > > > > Looks pretty good. However, I'm worried this special case you added > > at my prodding might be problematic, and that I may have been wrong to > > prod you into it... > > > >> Thanks! -Victoria > >> > >> Range-diff vs v5: > >> > >> 7: a9135a5ed64 ! 7: 822d7344587 reset: make --mixed sparse-aware > >> @@ Commit message > >> > >> Remove the `ensure_full_index` guard on `read_from_tree` and update `git > >> reset --mixed` to ensure it can use sparse directory index entries wherever > >> - possible. Sparse directory entries are reset use `diff_tree_oid`, which > >> + possible. Sparse directory entries are reset using `diff_tree_oid`, which > >> requires `change` and `add_remove` functions to process the internal > >> contents of the sparse directory. The `recursive` diff option handles cases > >> in which `reset --mixed` must diff/merge files that are nested multiple > >> @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, > >> + * (since we can reset whole sparse directories without expanding them). > >> + */ > >> + if (item.nowildcard_len < item.len) { > >> ++ /* > >> ++ * Special case: if the pattern is a path inside the cone > >> ++ * followed by only wildcards, the pattern cannot match > >> ++ * partial sparse directories, so we don't expand the index. > >> ++ */ > >> ++ if (path_in_cone_mode_sparse_checkout(item.original, &the_index) && > >> ++ strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len) > > > > I usually expect in an &&-chain to see the cheaper function call first > > (because that ordering often avoids the need to call the second > > function), and I would presume that strspn() would be the cheaper of > > the two. Did you switch the order because you expect the strspn call > > to nearly always return true, though? > > > > This is a miss on my part, the `strspn()` check is probably less expensive > and should be first. > > > Could the strspn() call be replaced by a `item.len == > > item.nowildcard_len + 1`? I mean, sure, folks could list multiple > > asterisks in a row in their pathspec, but that seems super unlikely > > and even if it does happen the code will just fall back to the slower > > codepath and still give them the right answer. And the simpler check > > feels a lot easier to parse for human readers. > > > > Agreed on wanting better readability - if the multiple-wildcard case is > unlikely, the `PATHSPEC_ONESTAR` flag would indicate whether the pathspec > ends in a single wildcard character. If that flag is still too obscure, > though, I can stick with the length comparison. > > > But I'm worried there's a deeper issue here: > > > > > > Is the wildcard character (or characters) in path treated as a literal > > by path_in_cone_mode_sparse_checkout()? I think it is...and I'm > > worried that may be incorrect. For example, if the path is > > > > foo/* > > > > and the user has done a > > > > git sparse-checkout set foo/bar/ > > > > Then 'foo/baz/file' is not in the sparse checkout. However, 'foo/*' > > should match 'foo/baz/file' and yet 'foo/*' when treated as a literal > > path would be considered in the sparse checkout by > > path_in_cone_mode_sparse_checkout. Does this result in the code > > returning an incorrect answer? (Or did I misunderstand something so > > far?) > > > > Correct: `path_in_cone_mode_sparse_checkout` interprets the wildcard > literally, and the checks here take that into account. The goal of > `pathspec_needs_expanded_index` is to determine if the pathspec *may* match > only partial contents of a sparse directory (like '*.c', or 'f*le'). For a > `git reset --mixed`, only this scenario requires expansion; if an entire > sparse directory is matched by a pathspec, the entire sparse directory is > reset. > > Using your example, 'foo/*' does match 'foo/baz/file', but it also matches > 'foo/' itself; as a result, the `foo/` sparse directory index entry is reset > (rather than some individual files contained within it). The same goes for a > patchspec like 'fo*' ("in-cone" and ending in a wildcard). Conversely, a > pathspec like 'foo/ba*' would _not_ work (it wouldn't match something like > 'foo/test-file'), and neither would 'f*o' (it would match all of 'foo', but > would only match files ending in "o" in a directory 'f/'). > > Hope that helps! Ah, yes, thanks for the explanation. :-) > > I'm wondering if I misled you earlier in my musings about whether we > > could avoid the slow codepath for pathspecs with wildcard characters. > > Maybe there's no safe optimization here and wildcard characters should > > always go through the slower codepath. > > > >> ++ continue; > >> ++ > >> + for (pos = 0; pos < active_nr; pos++) { > >> + struct cache_entry *ce = active_cache[pos]; > >> + > >> 8: f91d1dcf024 = 8: ddd97fb2837 unpack-trees: improve performance of next_cache_entry > >> > >> -- > >> gitgitgadget >