Hi, On Mon, Nov 29, 2021 at 7:52 AM Victoria Dye via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote: > > Changes since V5 > ================ > > * Update t1092 test 'reset with wildcard pathspec' with more cases and > better checks > * Add "special case" wildcard pathspec check when determining whether to > expand the index (avoids double-loop over pathspecs & index entries) Looks pretty good. However, I'm worried this special case you added at my prodding might be problematic, and that I may have been wrong to prod you into it... > Thanks! -Victoria > > Range-diff vs v5: > > 7: a9135a5ed64 ! 7: 822d7344587 reset: make --mixed sparse-aware > @@ Commit message > > Remove the `ensure_full_index` guard on `read_from_tree` and update `git > reset --mixed` to ensure it can use sparse directory index entries wherever > - possible. Sparse directory entries are reset use `diff_tree_oid`, which > + possible. Sparse directory entries are reset using `diff_tree_oid`, which > requires `change` and `add_remove` functions to process the internal > contents of the sparse directory. The `recursive` diff option handles cases > in which `reset --mixed` must diff/merge files that are nested multiple > @@ builtin/reset.c: static void update_index_from_diff(struct diff_queue_struct *q, > + * (since we can reset whole sparse directories without expanding them). > + */ > + if (item.nowildcard_len < item.len) { > ++ /* > ++ * Special case: if the pattern is a path inside the cone > ++ * followed by only wildcards, the pattern cannot match > ++ * partial sparse directories, so we don't expand the index. > ++ */ > ++ if (path_in_cone_mode_sparse_checkout(item.original, &the_index) && > ++ strspn(item.original + item.nowildcard_len, "*") == item.len - item.nowildcard_len) I usually expect in an &&-chain to see the cheaper function call first (because that ordering often avoids the need to call the second function), and I would presume that strspn() would be the cheaper of the two. Did you switch the order because you expect the strspn call to nearly always return true, though? Could the strspn() call be replaced by a `item.len == item.nowildcard_len + 1`? I mean, sure, folks could list multiple asterisks in a row in their pathspec, but that seems super unlikely and even if it does happen the code will just fall back to the slower codepath and still give them the right answer. And the simpler check feels a lot easier to parse for human readers. But I'm worried there's a deeper issue here: Is the wildcard character (or characters) in path treated as a literal by path_in_cone_mode_sparse_checkout()? I think it is...and I'm worried that may be incorrect. For example, if the path is foo/* and the user has done a git sparse-checkout set foo/bar/ Then 'foo/baz/file' is not in the sparse checkout. However, 'foo/*' should match 'foo/baz/file' and yet 'foo/*' when treated as a literal path would be considered in the sparse checkout by path_in_cone_mode_sparse_checkout. Does this result in the code returning an incorrect answer? (Or did I misunderstand something so far?) I'm wondering if I misled you earlier in my musings about whether we could avoid the slow codepath for pathspecs with wildcard characters. Maybe there's no safe optimization here and wildcard characters should always go through the slower codepath. > ++ continue; > ++ > + for (pos = 0; pos < active_nr; pos++) { > + struct cache_entry *ce = active_cache[pos]; > + > 8: f91d1dcf024 = 8: ddd97fb2837 unpack-trees: improve performance of next_cache_entry > > -- > gitgitgadget