Re: [PATCH 19/19] hack: watchman/untracked cache mashup

Duy Nguyen <pclouds@xxxxxxxxx> · Tue, 15 Mar 2016 19:31:00 +0700

On Thu, Mar 10, 2016 at 1:36 AM, David Turner <dturner@xxxxxxxxxxxxxxxx> wrote:
>  static struct watchman_query *make_query(const char *last_update)
> @@ -60,8 +61,24 @@ static void update_index(struct index_state *istate,
>                         continue;
>
>                 pos = index_name_pos(istate, wm->name, strlen(wm->name));
> -               if (pos < 0)
> +               if (pos < 0) {
> +                       if (istate->untracked) {
> +                               char *name = xstrdup(wm->name);
> +                               char *dname = dirname(name);
> +
> +                               /*
> +                                * dirname() returns '.' for the root,
> +                                * but we call it ''.
> +                                */
> +                               if (dname[0] == '.' && dname[1] == 0)
> +                                       string_list_append(&istate->untracked->invalid_untracked, "");
> +                               else
> +                                       string_list_append(&istate->untracked->invalid_untracked,
> +                                                          dname);
> +                               free(name);
> +                       }
>                         continue;
> +               }

So if we detect an updated file that's not in the index, we are
prepared to invalidate that path, correct? We may invalidate more than
necessary if that's true. Imagine a.o is already ignored. If it's
rebuilt, we should not need to update untracked cache.

What I had in mind (and argued with watchman devs a bit [1]) was
maintain the file list of each clock and compare the file list of
different clocks to figure out what files are added or deleted. Then
we can generate invalidation list more accurately. We need to keep at
least one file list corresponds to  the clock saved in the index. When
we get the refresh request, we get a new file list (with new clock),
do a diff then save the invalidation list as an extension. Once we
notice that new clock is written back in index, we can discard older
file lists. In theory we should not need to keep too many file lists,
so even if one list is big, it should not be a big problem.

I have a note with me about race conditions with this approach, but I
haven't remembered exactly why or how yet.. My recent thoughts about
it though, are race conditions when you do "git status" is probably
tolerable. After all if you're doing some I/O when you do git-status,
you're guaranteed to miss some updates.

[1] https://github.com/facebook/watchman/issues/65
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html