Re: [PATCH 0/2] Sparse checkout status

Son Luong Ngoc <sluongng@xxxxxxxxx> · Wed, 17 Jun 2020 19:58:50 +0200

Hi Elijah,

On Wed, Jun 17, 2020 at 09:48:22AM -0700, Elijah Newren wrote:
> 
> Well, there is `git sparse-checkout list`, assuming users know they
> are in a sparse-checkout, but the whole point of my suggested change
> is that they sometimes don't.

Ah thats true.
This was added recently and definitely slipped my mind often.

> 
> This surprises me; I considered performance while writing it and kept
> it simple on that basis.  In particular:
>   * This does not cause any reading or writing of any extra files; it
> is done solely with information that is already loaded.
>   * If users aren't in a sparse-checkout, their performance overhead
> is a single if-check, which I doubt anyone can measure.
>   * If they are in a sparse-checkout, then they'd get one extra loop
> over files in the index to check the SKIP_WORKTREE bit.
> 
> In which cases would performance implications be a concern?  For a
> very simple point of reference, in a sparse-checkout of the linux
> kernel (using --cone mode and only selecting the drivers/ directory),
> I see the following timings for 'git status' in a clean checkout:
> 
> Without my change:
> [newren@tiger linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
> 1 'git status'
> Benchmark #1: git status
>   Time (mean ± σ):      78.8 ms ±   2.8 ms    [User: 48.9 ms, System: 76.9 ms]
>   Range (min … max):    74.0 ms …  84.0 ms    38 runs
> 
> With my change:
> [newren@eenie linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
> 1 'git status'
> Benchmark #1: git status
>   Time (mean ± σ):      79.8 ms ±   2.7 ms    [User: 49.3 ms, System: 77.7 ms]
>   Range (min … max):    74.8 ms …  84.5 ms    37 runs
> 
> I know the linux kernel is tiny compared to repos like Windows or
> Office, but the relative scaling considerations are identical: it's
> one extra loop through the cached entries checking a bit for each
> entry.  If people are worried about the "extra loop", I could find an
> existing loop to modify and just add an extra if-block in it so that
> we have the same number of loops.  (I'm doubtful that'd actually help,
> but if the concern is an extra loop, it'd certainly avoid that.)
> Anyway, if you've got more information about it being too costly, I'm
> happy to listen.  Otherwise, the overhead seems pretty small to me and
> it's only paid by those who would benefit from the information.
> 
> However, all that said, I have good news: Peff already implemented the
> flag users can use to avoid this extra output, and did so back in
> September of 2009.  It's called "--porcelain".  Automated commands
> should already be using it, and if they aren't, they are what needs
> fixing -- not the long form status output.

When I wrote my initial reaction, the idea of having more than just a
percentage reported back stuck in my mind, specifically with using the
in-tree checkout that I mentioned.

But yeah, that's something down the line to address, you are absolutely
correct that the current patch has no performance impact. Thanks for the
reminder about '--porcelain'.

> 
> I think having a 'git sparse-checkout status' would be a fine
> subcommand, and output like the above -- possibly also including other
> bits Stolee or I mentioned elsewhere in this thread -- would be cool
> and would be helpful; it'd complement what I'm doing here quite
> nicely.
> 
> But you're solving a related problem rather than the one I was
> focusing on, and you have left the issue I was focusing on
> unaddressed.  In particular, if users forgot that they sparsified in
> the first place, how are they going to know to run `git
> sparse-checkout status [--all]`?
> 
> I think having a simple line of output in `git status` would remind
> them.  With that reminder, they could today then go run 'git
> sparse-checkout list' or 'gvfs health' (as Stolee mentioned he uses
> internally) or './sparsify --info' (as I use internally) to get more
> info.  In the future we could provide additional things for them as
> well, such as your 'git sparse-checkout status'.
> 

I do concede that this point could be a separate problem set and addressed
separately in the future.

> 
> An aside, though, since you linked to the in-tree sparse-checkout
> definitions: When I reviewed that series, the possibility of merge
> conflicts and not knowing what sparse-checkout should have checked out
> when the in-tree defintions themselves were in a conflicted state
> seemed to me to be a pretty tough sticking point.  I'm hoping someone
> has a clever solution, but I still don't yet.  Do you?

I am no clever person, but I often take great pleasure in reading up
works of smarter people. One of which is the Google's and Facebook's Mercurial
extension sets that they opensourced a while ago to support large repos.

The test suite for FB's 'sparse' extension[1] may address your concerns?

The 'sparse' extension defines the sparse checkout definition of a
working repository. It supports '--enable-profile' which take in definition
files ('.sparse'). These profiles are often checked into the root dir 
of the repo.

> 
> Thanks,
> Elijah

Regards,
Son Luong.

[1]: https://bitbucket.org/facebook/hg-experimental/src/05ed5d06b353aca69551f3773f56a99994a1a6bf/tests/test-sparse-profiles.t#lines-115