[PATCH v6 00/12] Fast git status via a file system watcher

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a fairly significant rewrite since V5. The big changes include:

Multiple functions including preload-index(), ie_match_stat(), and
refresh_cache_ent() have been updated to honor the CE_FSMONITOR_VALID bit
following the same pattern as skip_worktree and CE_VALID.  As a result,
performance improvements apply to all git commands that would otherwise
have had to scan the entire working directory.

core.fsmonitor is now a registered command (instead of a hook) to
provide additional flexibility.  It is called when needed to ensure the
state of the index is up-to-date.

The Watchman integration script is now entirely written in perl to
minimize spawning additional helper commands.  This along with the other
changes have helped reduce the overhead and made the extension applicable
to more (ie smaller) repos.

There are additional opportunities for performance improvements but I
wanted to get this version out there and then build on it as the
foundation.  Some potential examples of future patches include:

 - call the integration script on a background thread so that it can
   execute in parallel.

 - optimize traverse trees by pruning out entire branches that do not
   contain any changes.

Other optimizations likely exist where knowledge that files have not
changed can be used to short circuit some of the normal workflow.

Performance
===========

With the various enhancements, performance has been improved especially
for smaller repos.  The included perf test compares status times without
fsmonitor to those with fsmonitor using the provided Watchman integration
script.

Due to the overhead of calling out to Watchman, on very small repos
(<10K files) the overhead exceeds the savings.  Once repos hit 10K files
the savings kick in and for repos beyond that, the savings are dramatic.

Test with 10,000 files                                           this tree
------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman)         0.35(0.03+0.04)
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)    0.37(0.00+0.09)
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)   0.43(0.03+0.06)
7519.6: status (fsmonitor=)                                      0.45(0.00+0.07)
7519.7: status -uno (fsmonitor=)                                 0.40(0.03+0.07)
7519.8: status -uall (fsmonitor=)                                0.44(0.04+0.04)

Test with 100,000 files                                          this tree
------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman)         0.33(0.01+0.03)
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)    0.36(0.00+0.06)
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)   0.93(0.00+0.07)
7519.6: status (fsmonitor=)                                      2.66(0.04+0.03)
7519.7: status -uno (fsmonitor=)                                 2.44(0.01+0.06)
7519.8: status -uall (fsmonitor=)                                2.94(0.03+0.07)

Test with 1,000,000 files                                        this tree
---------------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman)         1.45(0.00+0.06)
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)    0.88(0.01+0.04)
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)   6.14(0.03+0.04)
7519.6: status (fsmonitor=)                                      25.91(0.04+0.06)
7519.7: status -uno (fsmonitor=)                                 23.96(0.04+0.03)
7519.8: status -uall (fsmonitor=)                                28.81(0.00+0.07)

Note: all numbers above are with a warm disk cache on a fast SSD, real
world performance numbers are often dramatically better as fsmonitor can
eliminate all the file IO to lstat every file and then traverse the
working directory looking for untracked files.  For example, a cold
status without fsmonitor on a HDD with 1M files takes 1m22.774s

$ time git -c core.fsmonitor= status
On branch p0006-ballast

It took 2.09 seconds to enumerate untracked files. 'status -uno'
may speed it up, but you have to be careful not to forget to add
new files yourself (see 'git help status').
nothing to commit, working tree clean

real    1m22.774s
user    0m0.000s
sys     0m0.000s


Ben Peart (12):
  bswap: add 64 bit endianness helper get_be64
  preload-index: add override to enable testing preload-index
  update-index: add a new --force-write-index option
  fsmonitor: teach git to optionally utilize a file system monitor to
    speed up detecting new or changed files.
  fsmonitor: add documentation for the fsmonitor extension.
  ls-files: Add support in ls-files to display the fsmonitor valid bit
  update-index: add fsmonitor support to update-index
  fsmonitor: add a test tool to dump the index extension
  split-index: disable the fsmonitor extension when running the split
    index test
  fsmonitor: add test cases for fsmonitor extension
  fsmonitor: add a sample integration script for Watchman
  fsmonitor: add a performance test

 Documentation/config.txt                   |   6 +
 Documentation/githooks.txt                 |  23 +++
 Documentation/technical/index-format.txt   |  19 +++
 Makefile                                   |   3 +
 apply.c                                    |   2 +-
 builtin/ls-files.c                         |   8 +-
 builtin/update-index.c                     |  26 ++-
 cache.h                                    |  10 +-
 compat/bswap.h                             |  22 +++
 config.c                                   |  14 ++
 config.h                                   |   1 +
 diff-lib.c                                 |   2 +
 dir.c                                      |  27 +--
 dir.h                                      |   2 +
 entry.c                                    |   4 +-
 environment.c                              |   1 +
 fsmonitor.c                                | 253 ++++++++++++++++++++++++++++
 fsmonitor.h                                |  61 +++++++
 preload-index.c                            |   8 +-
 read-cache.c                               |  49 +++++-
 submodule.c                                |   2 +-
 t/helper/.gitignore                        |   1 +
 t/helper/test-drop-caches.c                | 161 ++++++++++++++++++
 t/helper/test-dump-fsmonitor.c             |  21 +++
 t/perf/p7519-fsmonitor.sh                  | 184 ++++++++++++++++++++
 t/t1700-split-index.sh                     |   1 +
 t/t7519-status-fsmonitor.sh                | 259 +++++++++++++++++++++++++++++
 t/t7519/fsmonitor-all                      |  23 +++
 t/t7519/fsmonitor-none                     |  21 +++
 t/t7519/fsmonitor-watchman                 | 128 ++++++++++++++
 templates/hooks--fsmonitor-watchman.sample | 119 +++++++++++++
 unpack-trees.c                             |   8 +-
 32 files changed, 1440 insertions(+), 29 deletions(-)
 create mode 100644 fsmonitor.c
 create mode 100644 fsmonitor.h
 create mode 100644 t/helper/test-drop-caches.c
 create mode 100644 t/helper/test-dump-fsmonitor.c
 create mode 100755 t/perf/p7519-fsmonitor.sh
 create mode 100755 t/t7519-status-fsmonitor.sh
 create mode 100755 t/t7519/fsmonitor-all
 create mode 100755 t/t7519/fsmonitor-none
 create mode 100755 t/t7519/fsmonitor-watchman
 create mode 100755 templates/hooks--fsmonitor-watchman.sample

-- 
2.14.1.548.ge54b1befee.dirty




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux