On Wed, Jun 17, 2020 at 8:01 PM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 6/17/2020 9:59 PM, Elijah Newren wrote: > > On Wed, Jun 17, 2020 at 6:42 PM Derrick Stolee <stolee@xxxxxxxxx> wrote: > >> > >> On 6/17/2020 7:14 PM, Elijah Newren wrote: > >> You mentioned in another thread that it is a bit unwieldy for a user > >> to rely on a committed (or staged?) file, so adding the ability to > >> check the working directory first is interesting. I wonder how the > >> timing comes into play when changing HEAD to a new commit? Seems > >> tricky, but solvable. > > > > Isn't that essentially the same timing issue that comes into play if > > you only look at the index, and are changing HEAD to a new commit? > > It adds to the complexity. We can inspect the new index for the > in-tree definition and update the skip-worktree bits before actually > changing the working directory to match the new index. However, if > we trust the working directory copy over the index copy, then we > need to also decide if the working directory copy _would_ change > in this move before using it. > > Of course, maybe I'm making it over-complicated. I still know so > little about the index. I got into this feature due to a simple > pattern-matching problem that I could understand, but now I'm > getting lost in index states and dirty statuses. ;) Honestly, I think your first cut for switching branches with this new feature should just be: 1) Do a switch/checkout exactly the way it's done today already: 1a) Load the index and existing sparsity rules (from worktree then index), according to what existed before the branch switch 1b) Do the appropriate 1-way or 2-way merge from unpack_trees() to move to the right branch (as builtin/checkout.c already does) 2) *If* using in-tree sparsity feature, re-load the sparsity rules (from worktree then index) and run the equivalent of `sparse-checkout reapply` Not only does this avoid messing up any code for the normal case, I'm pretty sure this gets the behavior the user expects in all the special cases. There is one big downside, and a little downside. The big downside is that it'll have two progress meters instead of just one (much like sparse-checkout init && sparse-checkout set do today). The little downside is that this isn't optimal from a performance standpoint, for two reasons: (a) some files may be updated in the working tree in step 1b despite the fact that they'll be removed in step 2. (b) step 2 may just be an expensive no-op, in fact I suspect it will be most the time The reason I think the performance isn't a big deal is because: C) the fact that (b) is a problem means (a) is not often an issue -- the sparsity patterns are usually the same. D) Even if the sparsity patterns differ, it's often the case that files within the tree won't change (people tend to only modify a few files per commit, after all, so even switching branches only updates a subset of all files). E) Even if the sparsity patterns differ and files differ and have to be updated, it's still proportional at most to the number of files selected by the sparsity patterns, so it won't feel excessive or slow to the user. F) `sparse-checkout reapply` is pretty cheap, especially if called as a function instead of as a subprocess In my opinion, other than the unfortunate dual progress meters, the only place where this new feature gets hairy is that `git sparse-checkout reapply` (or its libified variant) now needs to know how to "read your new form of sparsity rules". That sounds easy at first but in my opinion it is a bit of a mess. The user could have written garbage to the file and I don't know what to do with garbage, but I think we have to do something sane. (Give up and checkout everything? Give up and checkout nothing but the toplevel directory? Ignore each line we don't understand and treat the rest as stuff that needs to be checked out? Can that last one even be done with the .ini reader?) I don't think saying "well, tough luck the user should know better" works because we already know that merges or rebases or checkout -m could have written garbage to those files (in the form of conflict markers), and users will say that *we* were the ones that filled the file with garbage. So we have to handle it intelligently in some fashion. Thoughts?