On 11/22/2021 1:36 PM, Elijah Newren wrote: > On Tue, Nov 16, 2021 at 7:38 AM Derrick Stolee via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: >> >> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >> >> Existing callers to 'git ls-files' are expecting file names, not >> directories. It is best to expand a sparse index to show all of the >> contained files in this case. >> >> However, expert users may want to inspect the contents of the index >> itself including which directories are sparse. Add a --sparse option to >> allow users to request this information. >> >> During testing, I noticed that options such as --modified did not affect >> the output when the files in question were outside the sparse-checkout >> definition. Tests are added to document this preexisting behavior and >> how it remains unchanged with the sparse index and the --sparse option. > > Sure, 'git diff' and 'git status' don't show anything for such files > either; we're being consistent. However, this feels like it's > normalizing an erroneous condition, possibly starting to cement one of > the bigger UI problems we have the sparse-checkout (and sparse-index): > > I assert that: Having tracked files be SKIP_WORKTREE despite having a > file present in the working directory is typically an *error*, and one > that we need to help users prevent/discover/recover-from. I do think we need to tread carefully here, since some users use 'git update-index --skip-worktree' to mark a file in their worktree as "ignored" so they can change it from HEAD without it ever being staged. This is usually independent of the sparse- checkout feature, so you might want to focus your efforts to paths that exist in the worktree but are outside of the sparse-checkout patterns. > * I've seen users get into this condition by doing the equivalent of > either 'git sparse-checkout disable' or 'git sparse-checkout set > $NEW_PATHS' and then hit Ctrl-C in the middle of its operation. > * Users also just occasionally copy files from other places (maybe > even `git show REVISION:FILE >FILE`), and potentially edit. > > In either of the above cases: > * There is no command provided for discovering these files; not diff, > status, ls-files, or anything > * There is no command provided for correcting the problem; not clean, > not reset, or anything. They have to do it by hand > * There are ample opportunities for the error to propagate and grow, > due to the fact that these files will not be reported or updated.[1] > > [1] For example, the user may first either switch to another branch or > resets to some other commit. Neither of those operations update these > erroneous present-despite-SKIP_WORKTREE files. And there will still > be nothing that reports them. But now the files are out-of-date with > respect to the new commit. Now if the user disables or changes the > sparse checkout, they suddenly have several "changed" files in their > working tree -- which they didn't touch. In the best case, they run > diff or status or something and notice the files and then correct it, > and just get perplexed at where the changes came from. But with > enough users, some percentage of them are just going to commit (some > of) those changes. When someone else asks why they modified those > files, they'll (correctly, as it turns out) claim they never did and > that no program on their system did. They might even think those > files weren't part of the commit and claim that git modified the > commit behind their back, which would be wrong, but there won't be any > reasonable logs to check to prove what happened. > > > I know this issue is out of scope for your series here, but the > addition of testcases that purposely set up an erroneous condition > makes it feel like we're trying to normalize that situation and not > treat it like an error, and are perhap undercutting future attempts to > try to fix it. I'd rather have it explicitly stated that we're > setting up an erroneous condition in our tests, in order to make sure > we handle it as best as we can so far -- in a manner in line with diff > and grep -- but also note that later solutions to this deeper issue > may affect one or more of these commands. I appreciate the attention to this scenario, but I also think that a patch series whose goal is to change how we react when in this case already needs to make a case for changing the behavior. Having tests that exist documenting the previous behavior form a good basis for "it did this before, but now it will do this," which hopefully further justifies the change. Personally, I think of _every_ test as "this is what the code does right now" and the purpose of a test is to show that we don't change things _unintentionally_. Every test can be changed if there is sufficient evidence that it should. >> @@ -838,6 +844,7 @@ test_expect_success 'sparse-index is not expanded' ' >> init_repos && >> >> ensure_not_expanded status && >> + ensure_not_expanded ls-files --sparse && > > Do we also want a > ensure_not_expanded ls-files && > ? We don't expect ls-files to write a new index file in any scenario, right? When the index is sparse, 'git ls-files' will expand before listing all of the contents, to avoid listing a sparse directory entry. One way to avoid ensure_full_index() would be to do trickery with read_tree_at() whenever we find a sparse directory entry and use the callback to output what ls-files would have written. However, that's pretty much the same amount of work as what ensure_full_index() is doing, so there is likely little benefit to complicating the code for this reason. >> ensure_not_expanded commit --allow-empty -m empty && >> echo >>sparse-index/a && >> ensure_not_expanded commit -a -m a && >> @@ -942,6 +949,46 @@ test_expect_success 'sparse index is not expanded: fetch/pull' ' >> ensure_not_expanded pull full base >> ' >> >> +test_expect_success 'ls-files' ' >> + init_repos && >> + >> + # Behavior agrees by default. Sparse index is expanded. >> + test_all_match git ls-files && >> + >> + # With --sparse, the sparse index data changes behavior. >> + git -C sparse-index ls-files --sparse >sparse-index-out && >> + grep "^folder1/\$" sparse-index-out && >> + grep "^folder2/\$" sparse-index-out && >> + >> + # With --sparse and no sparse index, nothing changes. >> + git -C sparse-checkout ls-files --sparse >sparse-checkout-out && >> + grep "^folder1/0/0/0\$" sparse-checkout-out && >> + ! grep "/\$" sparse-checkout-out && > > I'd be tempted to split the test up to this point from the rest of the test. Every instance of "init_repos" adds to the cost of this test script, so when possible I'd err on the side of grouping them. It also keeps test able to isolate with "--run=1,<N>". >> + >> + write_script edit-content <<-\EOF && >> + mkdir folder1 && >> + echo content >>folder1/a >> + EOF >> + run_on_sparse ../edit-content && > > As above, since folder1/a is a tracked file, I'd rather we explicitly > called out that we're intentionally setting up an erroneous condition. > >> + >> + # ls-files does not notice modified files whose >> + # cache entries are marked SKIP_WORKTREE. > > ...nor does diff, status, etc., as per my lengthy comment from the > beginning of the email. I think the existence of this comment points out that "this is such a strange situation that it requires explanation." A slight comment change such as # In this strange situation where a tracked file # exists in the worktree but is marked with the # SKIP_WORKTREE bit in the index, Git ignores the # worktree contents. That both points out that this case is unusual, but also that ls-files isn't the only command that does this. > Other than my concerns about careful messages with erroneous > conditions (and the minor question about also having a > ensure_not_expanded without the --sparse flag), the patch looks good > to me. Thanks for your careful read! -Stolee