Re: Some rough edges of core.fsmonitor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 1/30/2018 6:16 PM, Ævar Arnfjörð Bjarmason wrote:

On Tue, Jan 30 2018, Ben Peart jotted:

While some of these issues have been discussed in other threads, I
thought I'd summarize my thoughts here.

Thanks for this & your other reply. I'm going to get to testing some of
Duy's patches soon, and if you have some other relevant WIP I'd be happy
to try them, but meanwhile replying to a few of these:

On 1/26/2018 7:28 PM, Ævar Arnfjörð Bjarmason wrote:
I just got around to testing this since it landed, for context some
previous poking of mine in [1].

Issues / stuff I've noticed:

1) We end up invalidating the untracked cache because stuff in .git/
changed. For example:

      01:09:24.975524 fsmonitor.c:173         fsmonitor process '.git/hooks/fsmonitor-watchman' returned success
      01:09:24.975548 fsmonitor.c:138         fsmonitor_refresh_callback '.git'
      01:09:24.975556 fsmonitor.c:138         fsmonitor_refresh_callback '.git/config'
      01:09:24.975568 fsmonitor.c:138         fsmonitor_refresh_callback '.git/index'
      01:09:25.122726 fsmonitor.c:91          write fsmonitor extension successful

Am I missing something or should we do something like:

      diff --git a/fsmonitor.c b/fsmonitor.c
      index 0af7c4edba..5067b89bda 100644
      --- a/fsmonitor.c
      +++ b/fsmonitor.c
      @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que

       static void fsmonitor_refresh_callback(struct index_state *istate, const char *name)
       {
      -       int pos = index_name_pos(istate, name, strlen(name));
      +       int pos;
      +
      +       if (!strcmp(name, ".git") || starts_with(name, ".git/"))
      +               return;
      +
      +       pos = index_name_pos(istate, name, strlen(name));

              if (pos >= 0) {
                      struct cache_entry *ce = istate->cache[pos];

With that patch applied status on a large repo[2] goes from a consistent
~180-200ms to ~140-150ms, since we're not invalidating some of the UC
structure


I favor making this optimization by updating
untracked_cache_invalidate_path() so that it ignores paths under
get_git_dir() and doesn't invalidate the untracked cache or flag the
index as dirty.

*nod*

2) We re-write out the index even though we know nothing changed

When you first run with core.fsmonitor it needs to
mark_fsmonitor_clean() for every path, but is there a reason for why we
wouldn't supply the equivalent of GIT_OPTIONAL_LOCKS=0 if all paths are
marked and we know from the hook that nothing changed? Why write out the
index again?


Writing out the index when core.fsmonitor is first turned on is
necessary to get the index extension added with the current state of
the dirty flags.  Given it is a one time cost, I don't think we have
anything worth trying to optimize here.

Indeed, that makes sense. What I was showing here is even after the
initial setup we continue to write it out when we know nothing changed.

We do that anyway without fsmonitor, but this seemed like a worthwhile
thing to optimize.


There is already logic to avoid writing out the index unless there is something that requires it. In my testing, it was often the untracked cache marking the index as dirty that was causing the unexpected writes.

The patch to make the untracked cache only flag the index as dirty when the feature is being turned on or off is pretty simple (see below). The challenge was that many of the untracked cache tests assume all changes are saved to disk after every command so they fail after making this change. The patch below does a simple hack of checking for a test environment variable and only forcing the index writes if it is set.

Internally, we simply turned off the untracked cache as it's usefulness in combination with GVFS is limited and without the patch, it actually slowed down common operations more than it sped them up.

Typically, changes to the untracked cache don't accumulate long before the user does some operation that requires the index to be written out at which point the untracked cache is updated as well.


diff --git a/dir.c b/dir.c
index 5e93a1350b..af1d33aae1 100644
--- a/dir.c
+++ b/dir.c
@@ -2256,7 +2256,8 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
                                 dir->untracked->gitignore_invalidated,
                                 dir->untracked->dir_invalidated,
                                 dir->untracked->dir_opened);
-               if (dir->untracked == istate->untracked &&
+               if (getenv("GIT_TEST_UNTRACKED_CACHE") &&
+                       dir->untracked == istate->untracked &&
                    (dir->untracked->dir_opened ||
                     dir->untracked->gitignore_invalidated ||
                     dir->untracked->dir_invalidated))
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index ea9383e8cb..e5811b6ef2 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -14,6 +14,8 @@ test_description='test untracked cache'
 # See <20160803174522.5571-1-pclouds@xxxxxxxxx> if you want to know
 # more.

+export GIT_TEST_UNTRACKED_CACHE=true
+
 sync_mtime () {
        if test_have_prereq BUSYBOX
        then





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux