Re: [PATCH 1/4] fsmonitor: use fsmonitor data in `git diff`

Taylor Blau <me@xxxxxxxxxxxx> · Sun, 18 Oct 2020 00:17:47 -0400

On Sun, Oct 18, 2020 at 01:54:44AM +0100, Nipunn Koorapati wrote:
> > 20% of the working tree, running refresh_fsmonitor() for the entire
> > working tree is still a win, but if we are only checking less than
> > that, we are better off without fsmonitor, or does a tradeoff like
> > that exist?
>
> My understanding is that refresh_fsmonitor is
> O(delta_since_last_refresh) - so for developers
> with large repositories - this cost will amortize out over subsequent
> commands, so I don't
> think it's worth investigating this tradeoff here.
> As a user of large repositories, I expect that my major source of
> fsmonitor activity to be user
> intent (eg git pull, or intentionally copying/editing a large number
> of files). After such a command,
> I expect my next git command to be slower - that would be unsurprising.
>
> I think the tradeoff could be made for small diff requests, but I
> don't think it's worth adding complexity here -
> as that user will just have to pay the cost on their next git command.

Hmm. I do agree that I'd like to stay out of the business of trying to
figure out exactly what that trade-off is (although I'm sure that it
exists), only because it seems likely to vary to a large extent from
repository to repository. (That is, 20% may be a good number for some
repository, but a terrible choice for another).

But, I think that we can invoke watchman better here; the
fsmonitor-watchman hook has no notion of a "pathspec", so every query
just asks for everything that isn't in '$GIT_DIR'. Is there anything
preventing us from taking an optional pathspec and building up a more
targeted query?

There is some overhead to invoke the hook and talk to watchman, but
I'd expect that to be dwarfed by not having to issue O(# files)
syscalls.

> >
> > > +              */
> > > +             if (ce->ce_flags & CE_VALID || ce->ce_flags & CE_FSMONITOR_VALID) {
> >
> > Would it become easier to read, if written like this instead?
> >
> >                 if (ce->ce_flags & (CE_VALID | CE_FSMONITOR_VALID)) {
>
> I personally find this more confusing because it involves multiple
> bitwise ops, but this
> is potentially due to me having more mental practice thinking about
> boolean operators vs bitwise operators.
> I'm more than happy to align with the common pattern of the repo. I'll
> change this.

I don't have an opinion, nor do I think that git.git has an established
practice of doing one over the other. For what it's worth, my two-cents
is that Junio's suggestion is easier to read.

Thanks,
Taylor