Goal ~~~~ Today, git must check existing files to see if there have been changes and scan the working directory looking for new, untracked files. As the number of files and folders in the working directory increases, the time to perform these checks can become very expensive O(# files in working directory). Given the number of new or modified files is typically a very small percentage of the total number of files, it would be much more performant if git only had to check files and folders that potentially had changes. This reduces the cost to O(# modified files). This patch series makes it possible to optionally add a hook process that can return the set of files that may have been changed since the requested time. Git can then use this to limit its scan to only those files and folders that potentially have changes. Design ~~~~~~ A new git hook (query-fsmonitor) must exist and be enabled (core.fsmonitor=true) that takes a time_t formatted as a string and outputs to stdout all files that have been modified since the requested time. A new 'fsmonitor' index extension has been added to store the time the fsmonitor hook was last queried and a ewah bitmap of the current 'fsmonitor-dirty' files. Unmarked entries are 'fsmonitor-clean', marked entries are 'fsmonitor-dirty.' As needed, git will call the query-fsmonitor hook proc for the set of changes since the index was last updated. Git then uses this set of files along with the list saved in the fsmonitor index extension to flag the potentially dirty index and untracked cache entries. refresh_index() and valid_cached_dir() are updated so that any entry not flagged as potentially dirty is not checked as it cannot have any changes. This saves all the work of checking files and folders for changes that are already known to be clean. If git finds out some entries are 'fsmonitor-dirty', but are really unchanged (e.g. the file was changed, then reverted back), then Git will clear the marking in the extension. If git adds or updates an index entry, it is marked 'fsmonitor-dirty' to ensure it is checked for changes. The code is conservative so in case of any error (missing index extension, error from hook, etc) it falls back to normal logic of checking everything. A sample hook is provided in query-fsmonitor.sample to integrate with the cross platform Watchman file watching service https://facebook.github.io/watchman/ Performance ~~~~~~~~~~~ The performance wins of this model are pretty dramatic. Each test was run 3 times and averaged. "Files" is the number of files in the working directory. Tests were done with a cold file system cache as well as with a warm file system cache on a HDD. SSD speeds were typically about 10x faster than the HDD. Typical real world results would fall somewhere between these extremes. *--------------------------------------------------------* | Repo on HDD | Cache | fsmonitor=false | fsmonitor=true | *--------------------------------------------------------* | 3K Files | Cold | 0.77s | 0.55s | +--------------------------------------------------------+ | 100K Files | Cold | 38.76s | 2.17s | +--------------------------------------------------------+ | 3M Files | Cold | 421.55s | 18.57s | +--------------------------------------------------------+ | 3K Files | Warm | 0.05s | 0.24s | +--------------------------------------------------------+ | 100K Files | Warm | 1.13s | 0.40s | +--------------------------------------------------------+ | 3M Files | Warm | 59.33s | 4.19s | +--------------------------------------------------------+ Note that with the smallest repo, warm times actually increase slightly as the overhead of calling the hook, watchman and perl outweighs the savings of not scanning the working directory. Open Issues ~~~~~~~~~~~ The index extension currently has a 32 bit version number, a 64 bit time and a 32 bit bitmap size. Do I need to quad-align the version and bitmap size in the index extension or can all supported platforms handle dereferencing memory that isn't quad aligned? Credits ~~~~~~~ Idea taken and code refactored from http://public-inbox.org/git/1466914464-10358-1-git-send-email-novalis@xxxxxxxxxxx/ Current version as a fork of GFW on GitHub here: https://github.com/benpeart/git-for-windows/tree/fsmonitor Ben Peart (5): dir: make lookup_untracked() available outside of dir.c Teach git to optionally utilize a file system monitor to speed up detecting new or changed files. fsmonitor: add test cases for fsmonitor extension Add documentation for the fsmonitor extension. This includes the core.fsmonitor setting, the query-fsmonitor hook, and the fsmonitor index extension. Add a sample query-fsmonitor hook script that integrates with the cross platform Watchman file watching service. Documentation/config.txt | 7 + Documentation/githooks.txt | 23 +++ Documentation/technical/index-format.txt | 18 +++ Makefile | 1 + builtin/update-index.c | 1 + cache.h | 5 + config.c | 5 + dir.c | 15 +- dir.h | 5 + entry.c | 1 + environment.c | 1 + fsmonitor.c | 233 +++++++++++++++++++++++++++++++ fsmonitor.h | 9 ++ read-cache.c | 28 +++- t/t7519-status-fsmonitor.sh | 134 ++++++++++++++++++ templates/hooks--query-fsmonitor.sample | 27 ++++ unpack-trees.c | 1 + 17 files changed, 511 insertions(+), 3 deletions(-) create mode 100644 fsmonitor.c create mode 100644 fsmonitor.h create mode 100644 t/t7519-status-fsmonitor.sh create mode 100644 templates/hooks--query-fsmonitor.sample -- 2.13.0.windows.1.6.g4597375fc3