On 5/18/2021 2:26 PM, Derrick Stolee wrote: > On 5/17/2021 10:27 PM, Elijah Newren wrote: >> On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget >> <gitgitgadget@xxxxxxxxx> wrote: >>> >>> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >>> >>> It is difficult, but possible, to get into a state where we intend to >>> add a directory that is outside of the sparse-checkout definition. Add a >> >> Then we need to fix that; allowing things to be added outside the >> sparse-checkout definition is a bug[1][2]. That's an invariant I >> believe we should maintain everywhere; things get really confusing to >> users somewhere later down the road if we don't. Matheus worked to >> fix that with 'git add'; if there are other commands that need fixing >> too, then we should also fix them. >> >> [1] https://lore.kernel.org/git/CABPp-BFhyFiKSXdLM5q5t=ZKzr6V0pY7dbheierRaOHFbMEdkg@xxxxxxxxxxxxxx/ >> [2] https://lore.kernel.org/git/CABPp-BF0ZhbSs42R3Bw_r-hbhQ71qtbXSBqXdq0djyaan=8p=A@xxxxxxxxxxxxxx/ >> >>> test to t1092-sparse-checkout-compatibility.sh that demonstrates this >>> using a combination of 'git reset --mixed' and 'git checkout --orphan'. >> >> I think `git checkout --orphan` should just throw an error if >> sparse-checkout is in use. Allowing adding paths outside the >> sparse-checkout set causes too much collateral and deferred confusion >> for users. > > I've been trying to strike an interesting balance of creating > performance improvements without changing behavior, trying to > defer those behavior changes to an isolated instance. I think > that approach is unavoidable with the 'git add' work that I > pulled out of this series and will return to soon. > > However, here I think it would be too much to start throwing > an error in this case. I think that change is a bit too much. > > The thing I can try to do, instead of the current approach, is > to not allow sparse directory entries to differ between the > index and HEAD. That will satisfy this case, but also a lot of > other painful cases. > > I have no idea how to actually accomplish that, but I'll start > digging. It didn't take much digging to discover that this is likely impossible, or rather it would be a drastic change to make this happen. The immediate issue is trying to prevent sparse directory entries from existing when the contained paths don't match what exists at HEAD. However, in the 'git checkout --orphan' case, we are using a full index for the unpack_trees() that updates the in-memory index according to the paths at HEAD, then updates HEAD to point to a non-existing ref. The sparse directories are only created as part of convert_to_sparse() within do_write_index(). At that point, there is no HEAD provided. Trying to load it from scratch violates the fact that HEAD is being staged to change _after_ the index updates in a command like 'git checkout'. So, the drastic change to make this work would be to update the index API to require a root tree to be provided whenever writing the index. However, that doesn't make sense, either! What do we do when in a conflicted state? What if a user modifies HEAD manually to point to a new ref? Such a change would couple the index to the concept of HEAD in an unproductive way, I think. The index data structure exists as a separate entity that is frequently _compared_ to HEAD, and the solution presented in this patch presents a way to keep the comparison of a sparse-index and HEAD to be the same as if we had a full index. So, after looking into it, I'm back in favor of this change and forever allowing sparse cache entries to differ from HEAD, because there is no way to avoid it. Thanks, -Stolee