Hi Elijah, On Wed, Jun 17, 2020 at 09:48:22AM -0700, Elijah Newren wrote: > > Well, there is `git sparse-checkout list`, assuming users know they > are in a sparse-checkout, but the whole point of my suggested change > is that they sometimes don't. Ah thats true. This was added recently and definitely slipped my mind often. > > This surprises me; I considered performance while writing it and kept > it simple on that basis. In particular: > * This does not cause any reading or writing of any extra files; it > is done solely with information that is already loaded. > * If users aren't in a sparse-checkout, their performance overhead > is a single if-check, which I doubt anyone can measure. > * If they are in a sparse-checkout, then they'd get one extra loop > over files in the index to check the SKIP_WORKTREE bit. > > In which cases would performance implications be a concern? For a > very simple point of reference, in a sparse-checkout of the linux > kernel (using --cone mode and only selecting the drivers/ directory), > I see the following timings for 'git status' in a clean checkout: > > Without my change: > [newren@tiger linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup > 1 'git status' > Benchmark #1: git status > Time (mean ± σ): 78.8 ms ± 2.8 ms [User: 48.9 ms, System: 76.9 ms] > Range (min … max): 74.0 ms … 84.0 ms 38 runs > > With my change: > [newren@eenie linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup > 1 'git status' > Benchmark #1: git status > Time (mean ± σ): 79.8 ms ± 2.7 ms [User: 49.3 ms, System: 77.7 ms] > Range (min … max): 74.8 ms … 84.5 ms 37 runs > > I know the linux kernel is tiny compared to repos like Windows or > Office, but the relative scaling considerations are identical: it's > one extra loop through the cached entries checking a bit for each > entry. If people are worried about the "extra loop", I could find an > existing loop to modify and just add an extra if-block in it so that > we have the same number of loops. (I'm doubtful that'd actually help, > but if the concern is an extra loop, it'd certainly avoid that.) > Anyway, if you've got more information about it being too costly, I'm > happy to listen. Otherwise, the overhead seems pretty small to me and > it's only paid by those who would benefit from the information. > > However, all that said, I have good news: Peff already implemented the > flag users can use to avoid this extra output, and did so back in > September of 2009. It's called "--porcelain". Automated commands > should already be using it, and if they aren't, they are what needs > fixing -- not the long form status output. When I wrote my initial reaction, the idea of having more than just a percentage reported back stuck in my mind, specifically with using the in-tree checkout that I mentioned. But yeah, that's something down the line to address, you are absolutely correct that the current patch has no performance impact. Thanks for the reminder about '--porcelain'. > > I think having a 'git sparse-checkout status' would be a fine > subcommand, and output like the above -- possibly also including other > bits Stolee or I mentioned elsewhere in this thread -- would be cool > and would be helpful; it'd complement what I'm doing here quite > nicely. > > But you're solving a related problem rather than the one I was > focusing on, and you have left the issue I was focusing on > unaddressed. In particular, if users forgot that they sparsified in > the first place, how are they going to know to run `git > sparse-checkout status [--all]`? > > I think having a simple line of output in `git status` would remind > them. With that reminder, they could today then go run 'git > sparse-checkout list' or 'gvfs health' (as Stolee mentioned he uses > internally) or './sparsify --info' (as I use internally) to get more > info. In the future we could provide additional things for them as > well, such as your 'git sparse-checkout status'. > I do concede that this point could be a separate problem set and addressed separately in the future. > > An aside, though, since you linked to the in-tree sparse-checkout > definitions: When I reviewed that series, the possibility of merge > conflicts and not knowing what sparse-checkout should have checked out > when the in-tree defintions themselves were in a conflicted state > seemed to me to be a pretty tough sticking point. I'm hoping someone > has a clever solution, but I still don't yet. Do you? I am no clever person, but I often take great pleasure in reading up works of smarter people. One of which is the Google's and Facebook's Mercurial extension sets that they opensourced a while ago to support large repos. The test suite for FB's 'sparse' extension[1] may address your concerns? The 'sparse' extension defines the sparse checkout definition of a working repository. It supports '--enable-profile' which take in definition files ('.sparse'). These profiles are often checked into the root dir of the repo. > > Thanks, > Elijah Regards, Son Luong. [1]: https://bitbucket.org/facebook/hg-experimental/src/05ed5d06b353aca69551f3773f56a99994a1a6bf/tests/test-sparse-profiles.t#lines-115