Hi Son, Thanks for the feedback. On Wed, Jun 17, 2020 at 12:40 AM Son Luong Ngoc <sluongng@xxxxxxxxx> wrote: > > Hi Elijah, > > > Some of the feedback of folks trying out sparse-checkouts at $dayjob is that > > sparse checkouts can sometimes be disorienting; users can forget that they > > had a sparse-checkout and then wonder where files went. > > I agree with this observation: that the current 'git sparse-checkout' experience > could be a bit 'lost' for end users, who may or may not be familiar > with git's 'arcane magic'. > > Currently the only way to verify what's going on is to either run > 'tree <repo-root-dir>' > or 'cat .git/info/sparse-checkout' (human-readable but not easy). Well, there is `git sparse-checkout list`, assuming users know they are in a sparse-checkout, but the whole point of my suggested change is that they sometimes don't. > > This series adds some output to 'git status' and modifies git-prompt slightly as an attempt > > to help. > > This is a great idea but I suggest to put a config/flag to let users > enable/disable this. > > Git status is often utilized in automated commands (IDE, shell prompt, > etc...) and there may be > performance implications down the line not being able to skip this bit > of information out. This surprises me; I considered performance while writing it and kept it simple on that basis. In particular: * This does not cause any reading or writing of any extra files; it is done solely with information that is already loaded. * If users aren't in a sparse-checkout, their performance overhead is a single if-check, which I doubt anyone can measure. * If they are in a sparse-checkout, then they'd get one extra loop over files in the index to check the SKIP_WORKTREE bit. In which cases would performance implications be a concern? For a very simple point of reference, in a sparse-checkout of the linux kernel (using --cone mode and only selecting the drivers/ directory), I see the following timings for 'git status' in a clean checkout: Without my change: [newren@tiger linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup 1 'git status' Benchmark #1: git status Time (mean ± σ): 78.8 ms ± 2.8 ms [User: 48.9 ms, System: 76.9 ms] Range (min … max): 74.0 ms … 84.0 ms 38 runs With my change: [newren@eenie linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup 1 'git status' Benchmark #1: git status Time (mean ± σ): 79.8 ms ± 2.7 ms [User: 49.3 ms, System: 77.7 ms] Range (min … max): 74.8 ms … 84.5 ms 37 runs I know the linux kernel is tiny compared to repos like Windows or Office, but the relative scaling considerations are identical: it's one extra loop through the cached entries checking a bit for each entry. If people are worried about the "extra loop", I could find an existing loop to modify and just add an extra if-block in it so that we have the same number of loops. (I'm doubtful that'd actually help, but if the concern is an extra loop, it'd certainly avoid that.) Anyway, if you've got more information about it being too costly, I'm happy to listen. Otherwise, the overhead seems pretty small to me and it's only paid by those who would benefit from the information. However, all that said, I have good news: Peff already implemented the flag users can use to avoid this extra output, and did so back in September of 2009. It's called "--porcelain". Automated commands should already be using it, and if they aren't, they are what needs fixing -- not the long form status output. > > For reference, I suspect that in repositories that are large enough that > > people always use sparse-checkouts (e.g. windows or office repos), that this > > isn't a problem. But when the repository is approximately > > linux-kernel-sized, then it is reasonable for some folks to have a full > > checkout. sparse-checkouts, however, can provide various build system and > > IDE performance improvements, so we have a split of users who have > > sparse-checkouts and those who have full checkouts. It's easy for users who > > are bridging in between the two worlds or just trying out sparse-checkouts > > for the first time to get confused. > > One of our users noted that the experience is improved when combining > 'git worktree' with sparse-checkout. > That way you get the correct sparsity for the topic that you are working on. > > In a way, the current sparse-checkout experience is similar to a user > running 'git checkout <rev>' directly > instead of checking out a branch. > It does not feel tangible and reproducible. > > I was hoping that these concerns will be addressed once the In-Tree > Sparse-Checkout Definition RFC[1] patch landed. > We should then be able to print out which Definition File(s) (we often > call it manifests) were used, > and ideally, only the top most file(s) in the inheritance tree. > > So the ideal experience, in my mind, is something of this sort: > > git sc init --cone > > # assuming a inherited from b and c > git sc add --in-tree manifest-dir/module-a.manifest > git sc add --in-tree manifest-dir/module-d.manifest > > git sc status > Your sparse checkout includes following definition(s): > (1) manifest-dir/module-a.manifest > (2) manifest-dir/module-d.manifest > > git sc status --all > Your sparse checkout includes following definition(s): > (1) manifest-dir/module-a.manifest > (2) manifest-dir/module-d.manifest > (3) manifest-dir/module-b.manifest (included by 1) > (4) manifest-dir/module-c.manifest (included by 1) I think having a 'git sparse-checkout status' would be a fine subcommand, and output like the above -- possibly also including other bits Stolee or I mentioned elsewhere in this thread -- would be cool and would be helpful; it'd complement what I'm doing here quite nicely. But you're solving a related problem rather than the one I was focusing on, and you have left the issue I was focusing on unaddressed. In particular, if users forgot that they sparsified in the first place, how are they going to know to run `git sparse-checkout status [--all]`? I think having a simple line of output in `git status` would remind them. With that reminder, they could today then go run 'git sparse-checkout list' or 'gvfs health' (as Stolee mentioned he uses internally) or './sparsify --info' (as I use internally) to get more info. In the future we could provide additional things for them as well, such as your 'git sparse-checkout status'. An aside, though, since you linked to the in-tree sparse-checkout definitions: When I reviewed that series, the possibility of merge conflicts and not knowing what sparse-checkout should have checked out when the in-tree defintions themselves were in a conflicted state seemed to me to be a pretty tough sticking point. I'm hoping someone has a clever solution, but I still don't yet. Do you? Thanks, Elijah