Re: [PATCH v5 09/15] index-helper: use watchman to avoid refreshing index with lstat()

David Turner <dturner@xxxxxxxxxxxxxxxx> · Tue, 19 Apr 2016 21:01:38 -0400

On Wed, 2016-04-20 at 07:15 +0700, Duy Nguyen wrote:
> Continuing my comment from the --use-watchman patch about watchman
> not
> being supported...
> 
> On Wed, Apr 20, 2016 at 6:28 AM, David Turner <
> dturner@xxxxxxxxxxxxxxxx> wrote:
> > +static int poke_and_wait_for_reply(int fd)
> > +{
> > +       struct strbuf buf = STRBUF_INIT;
> > +       struct strbuf reply = STRBUF_INIT;
> > +       int ret = -1;
> > +       fd_set fds;
> > +       struct timeval timeout;
> > +
> > +       timeout.tv_usec = 0;
> > +       timeout.tv_sec = 1;
> > +
> > +       if (fd < 0)
> > +               return -1;
> > +
> > +       strbuf_addf(&buf, "poke %d", getpid());
> > +       if (write_in_full(fd, buf.buf, buf.len + 1) != buf.len + 1)
> > +               goto done_poke;
> > +
> > +       /* Now wait for a reply */
> > +       FD_ZERO(&fds);
> > +       FD_SET(fd, &fds);
> > +       if (select(fd + 1, &fds, NULL, NULL, &timeout) == 0)
> > +               /* No reply, giving up */
> > +               goto done_poke;
> > +
> > +       if (strbuf_getwholeline_fd(&reply, fd, 0))
> > +               goto done_poke;
> > +
> > +       if (!starts_with(reply.buf, "OK"))
> > +               goto done_poke;
> 
> ... while we could simply check USE_WATCHMAN macro and reject in
> update-index, a better solution is sending "poke %d watchman" and
> returning "OK watchman" (vs "OK") when watchman is supported and
> active. If the user already requests watchman and index-helper
> returns
> just "OK" then we can warn the user the reason of possible
> performance
> degradation. It's related to the error reporting, but I don't think
> you can send straight errors over unix socket. It's possible but it's
> a bit more complicated.

Do you mean that we should do this here?  Or in update-index -
-watchman?  If the former, I agree.  If the latter, I'm not sure; maybe
you'll be setting up your index before you've started the index helper?

> > +static void refresh_by_watchman(struct index_state *istate)
> > +{
> > +       void *shm = NULL;
> > +       int length;
> > +       int i;
> > +       struct stat st;
> > +       int fd = -1;
> > +       const char *path = git_path("shm-watchman-%s-%"PRIuMAX,
> > +                                   sha1_to_hex(istate->sha1),
> > +                                   (uintmax_t)getpid());
> > +
> > +       fd = open(path, O_RDONLY);
> > +       if (fd < 0)
> > +               return;
> > +
> > +       /*
> > +        * This watchman data is just for us -- no need to keep it
> > +        * around once we've got it open.
> > +        */
> > +       unlink(path);
> 
> This will not play well when multiple processes read and refresh the
> index at the same time. 

Multiple processes will have different pids, right?  And the pid is
included in the filename.  Am I missing something?

> This is really extra. But if we know in advance that git does not 
> need refresh(), then we should be able to tell index-helper not to 
> waste cycles contacting watchman and preparing shm-watchman-%s-%d 
> (the poke line gets more parameters). Either that, or we decouple 
> watchman requests from read_cache() requests. Only when 
> refresh_index() is called that we send something to request shm-
> watchman-%s-%d. The same for read_directory() (i.e. untracked cache 
> stuff). Hmm?

It's true that we could decouple watchman requests.  I'll look and see
if that's reasonable.

> Now that I think of it, with watchman backing us, we probably should
> just do nothing in update_index_if_able() (or write_locked_index()
> when we know only stat info is changed) when watchman is active. The
> purpose of update_index_if_able() is to avoid costly refresh, but we
> can already avoid that with watchman. And updating big index files is
> always costly (even though it should cost less with split-index). 

That sounds like a change we could make in a separate series.  It's not
a bad idea, but if our goal is to get the basic version out, we should
start there.

> Of
> course this can only be done if watchman (inotify to be precise) can
> cover whole worktree. I'm not sure how watchman behaves when there's
> not enough inotify resource to cover full worktree.

It will detect this case and will either manually recrawl (in the event
of a max_queued_events overflow) or return an error (in the event of
too many watched directories).

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html