Elijah Newren wrote: > Hi, > > On Mon, Nov 29, 2021 at 7:52 AM Victoria Dye via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: >> >> Changes since V5 >> ================ >> >> * Update t1092 test 'reset with wildcard pathspec' with more cases and >> better checks >> * Add "special case" wildcard pathspec check when determining whether to >> expand the index (avoids double-loop over pathspecs & index entries) > > Looks pretty good. However, I'm worried this special case you added > at my prodding might be problematic, and that I may have been wrong to > prod you into it... > >> Thanks! -Victoria >> >> Range-diff vs v5: >> >> 7: a9135a5ed64 ! 7: 822d7344587 reset: make --mixed sparse-aware >> @@ Commit message >> >> Remove the `ensure_full_index` guard on `read_from_tree` and update `git >> reset --mixed` to ensure it can use sparse directory index entries wherever >> - possible. Sparse directory entries are reset use `diff_tree_oid`, which >> + possible. Sparse directory entries are reset using `diff_tree_oid`, which >> requires `change` and `add_remove` functions to process the internal >> contents of the sparse directory. The `recursive` diff option handles cases >> in which `reset --mixed` must diff/merge files that are nested multiple >> @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, >> + * (since we can reset whole sparse directories without expanding them). >> + */ >> + if (item.nowildcard_len < item.len) { >> ++ /* >> ++ * Special case: if the pattern is a path inside the cone >> ++ * followed by only wildcards, the pattern cannot match >> ++ * partial sparse directories, so we don't expand the index. >> ++ */ >> ++ if (path_in_cone_mode_sparse_checkout(item.original, &the_index) && >> ++ strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len) > > I usually expect in an &&-chain to see the cheaper function call first > (because that ordering often avoids the need to call the second > function), and I would presume that strspn() would be the cheaper of > the two. Did you switch the order because you expect the strspn call > to nearly always return true, though? > This is a miss on my part, the `strspn()` check is probably less expensive and should be first. > Could the strspn() call be replaced by a `item.len == > item.nowildcard_len + 1`? I mean, sure, folks could list multiple > asterisks in a row in their pathspec, but that seems super unlikely > and even if it does happen the code will just fall back to the slower > codepath and still give them the right answer. And the simpler check > feels a lot easier to parse for human readers. > Agreed on wanting better readability - if the multiple-wildcard case is unlikely, the `PATHSPEC_ONESTAR` flag would indicate whether the pathspec ends in a single wildcard character. If that flag is still too obscure, though, I can stick with the length comparison. > But I'm worried there's a deeper issue here: > > > Is the wildcard character (or characters) in path treated as a literal > by path_in_cone_mode_sparse_checkout()? I think it is...and I'm > worried that may be incorrect. For example, if the path is > > foo/* > > and the user has done a > > git sparse-checkout set foo/bar/ > > Then 'foo/baz/file' is not in the sparse checkout. However, 'foo/*' > should match 'foo/baz/file' and yet 'foo/*' when treated as a literal > path would be considered in the sparse checkout by > path_in_cone_mode_sparse_checkout. Does this result in the code > returning an incorrect answer? (Or did I misunderstand something so > far?) > Correct: `path_in_cone_mode_sparse_checkout` interprets the wildcard literally, and the checks here take that into account. The goal of `pathspec_needs_expanded_index` is to determine if the pathspec *may* match only partial contents of a sparse directory (like '*.c', or 'f*le'). For a `git reset --mixed`, only this scenario requires expansion; if an entire sparse directory is matched by a pathspec, the entire sparse directory is reset. Using your example, 'foo/*' does match 'foo/baz/file', but it also matches 'foo/' itself; as a result, the `foo/` sparse directory index entry is reset (rather than some individual files contained within it). The same goes for a patchspec like 'fo*' ("in-cone" and ending in a wildcard). Conversely, a pathspec like 'foo/ba*' would _not_ work (it wouldn't match something like 'foo/test-file'), and neither would 'f*o' (it would match all of 'foo', but would only match files ending in "o" in a directory 'f/'). Hope that helps! > I'm wondering if I misled you earlier in my musings about whether we > could avoid the slow codepath for pathspecs with wildcard characters. > Maybe there's no safe optimization here and wildcard characters should > always go through the slower codepath. > >> ++ continue; >> ++ >> + for (pos = 0; pos < active_nr; pos++) { >> + struct cache_entry *ce = active_cache[pos]; >> + >> 8: f91d1dcf024 = 8: ddd97fb2837 unpack-trees: improve performance of next_cache_entry >> >> -- >> gitgitgadget