On Tue, May 18, 2021 at 12:05 PM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 5/18/2021 2:26 PM, Derrick Stolee wrote: > > On 5/17/2021 10:27 PM, Elijah Newren wrote: > >> On Fri, May 14, 2021 at 11:31 AM Derrick Stolee via GitGitGadget > >> <gitgitgadget@xxxxxxxxx> wrote: > >>> > >>> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > >>> > >>> It is difficult, but possible, to get into a state where we intend to > >>> add a directory that is outside of the sparse-checkout definition. Add a > >> > >> Then we need to fix that; allowing things to be added outside the > >> sparse-checkout definition is a bug[1][2]. That's an invariant I > >> believe we should maintain everywhere; things get really confusing to > >> users somewhere later down the road if we don't. Matheus worked to > >> fix that with 'git add'; if there are other commands that need fixing > >> too, then we should also fix them. > >> > >> [1] https://lore.kernel.org/git/CABPp-BFhyFiKSXdLM5q5t=ZKzr6V0pY7dbheierRaOHFbMEdkg@xxxxxxxxxxxxxx/ > >> [2] https://lore.kernel.org/git/CABPp-BF0ZhbSs42R3Bw_r-hbhQ71qtbXSBqXdq0djyaan=8p=A@xxxxxxxxxxxxxx/ > >> > >>> test to t1092-sparse-checkout-compatibility.sh that demonstrates this > >>> using a combination of 'git reset --mixed' and 'git checkout --orphan'. > >> > >> I think `git checkout --orphan` should just throw an error if > >> sparse-checkout is in use. Allowing adding paths outside the > >> sparse-checkout set causes too much collateral and deferred confusion > >> for users. > > > > I've been trying to strike an interesting balance of creating > > performance improvements without changing behavior, trying to > > defer those behavior changes to an isolated instance. I think > > that approach is unavoidable with the 'git add' work that I > > pulled out of this series and will return to soon. > > > > However, here I think it would be too much to start throwing > > an error in this case. I think that change is a bit too much. > > > > The thing I can try to do, instead of the current approach, is > > to not allow sparse directory entries to differ between the > > index and HEAD. That will satisfy this case, but also a lot of > > other painful cases. > > > > I have no idea how to actually accomplish that, but I'll start > > digging. > > It didn't take much digging to discover that this is likely > impossible, or rather it would be a drastic change to make this > happen. > > The immediate issue is trying to prevent sparse directory entries > from existing when the contained paths don't match what exists at > HEAD. However, in the 'git checkout --orphan' case, we are using > a full index for the unpack_trees() that updates the in-memory > index according to the paths at HEAD, then updates HEAD to point > to a non-existing ref. The sparse directories are only created as > part of convert_to_sparse() within do_write_index(). At that > point, there is no HEAD provided. Trying to load it from scratch > violates the fact that HEAD is being staged to change _after_ the > index updates in a command like 'git checkout'. > > So, the drastic change to make this work would be to update the > index API to require a root tree to be provided whenever writing > the index. However, that doesn't make sense, either! What do we > do when in a conflicted state? > > What if a user modifies HEAD manually to point to a new ref? > > Such a change would couple the index to the concept of HEAD in > an unproductive way, I think. The index data structure exists > as a separate entity that is frequently _compared_ to HEAD, and > the solution presented in this patch presents a way to keep the > comparison of a sparse-index and HEAD to be the same as if we > had a full index. > > So, after looking into it, I'm back in favor of this change and > forever allowing sparse cache entries to differ from HEAD, > because there is no way to avoid it. Doh, thanks for digging in and entertaining the idea. I'm worried we'll get lots of confused users over the years from not being able to do this, but you do make some good points. I still think `git checkout --orphan` should be an error when in a sparse checkout -- the point of a sparse checkout is that you only care about a subset of files, whereas checkout --orphan fundamentally says you are throwing away history but care about each and every file since you are staging "changes" from all of them to include in some new commit soon. They just seem in strong opposition to me, and it seems likely to result in surprises for some of the users when despite the --orphan request and them fixing up the working directory how they like, they get some new commit that contains files that aren't in their working tree. (In contrast, `git switch --orphan` would probably be fine in a sparse checkout, precisely because it really does empty everything). However, I do agree with you that such a change belongs in a separate series. So, yes, your patch is good, and I'll raise the behavioral change later. (Sorry for being slow to respond and still not getting to all your good reviews of my series; I'm a bit limited in my time for git right now...)