[PATCH 00/20] Untracked cache to speed up "git status"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



First of all, thanks for pointing to many more big repos. I'll look at
them tomorrow. End-of-day report (or ranting :D) time.

The series now looks good enough for public eyes. I haven't run the
test suite with untracked cache on by default so confidence level is
not so high. Although I suspect racy timestamp issue will practically
disable the cache anyway.

The idea is as before, exploit directory mtime to cache untracked
files. MSDN tells me NTFS on Windows exposes the "good" dir mtime
behavior, which means this series could speed up Git on Windows (I
think Karsten fscache only deals with slow lstat, untracked files..).

It would be nice if Windows people could try and confirm this. This
could be a good point for untracked cache vs watchman (no windows
support, last time I checked). Usage is very simple, "git update-index
--untracked-cache" and you're ready. Do --no-untracked-cache to revert
back.

The peformance numbers on webkit look good. If we focus on
read_directory time only. Normally it takes 890ms. The first run with
untracked cache goes up to 922ms (filling up the cache, not counting
index write time). The next run goes down to 184ms (best case). The
gain is about 80%. lstat costs on directories only about 20ms out of
that 184ms, so I still need to see if I can lower that number further
down.

"git status" performance gain is less impressive of course. Only about
38% with refresh time now becomes the top offender. With
core.preloadindex on, the gain increases to 50%. There's still room
for improvement to maybe make it to 65% by reducing read time, I think.

But again we may not stay in the best case forever. The more dirs are
damaged, the slower it gets. At the end of the spectrum, all dirs are
damanged, we gain nothing but overhead. This is actually when watchman
shines, although projects that do that may need some improvements.

Another bad point for untracked cache is, the extension data is
so specifiec to core git algorithm that it probably cannot be reused
by other implementations. Again, watchman shines here.

Last note, this series conflicts with split-index due to the
write_cache API change, so not a candidate for 'pu' yet. The series
could also be fetched from

https://github.com/pclouds/git/commits/untracked-cache

except the last few timing/experimental patches.

Nguyễn Thái Ngọc Duy (20):
  dir.c: coding style fix
  dir.h: move struct exclude declaration to top level
  prep_exclude: remove the artificial PATH_MAX limit
  dir.c: optionally compute sha-1 of a .gitignore file
  untracked cache: record .gitignore information and dir hierarchy
  untracked cache: initial untracked cache validation
  untracked cache: invalidate dirs recursively if .gitignore changes
  untracked cache: record/validate dir mtime and reuse cached output
  untracked cache: mark what dirs should be recursed/saved
  untracked cache: don't open non-existent .gitignore
  untracked cache: save to an index extension
  untracked cache: load from UNTR index extension
  untracked cache: invalidate at index addition or removal
  untracked cache: print untracked statistics with $GIT_TRACE_UNTRACKED
  read-cache.c: split racy stat test to a separate function
  untracked cache: avoid racy timestamps
  status: support untracked cache
  update-index: manually enable or disable untracked cache
  update-index: test the system before enabling untracked cache
  t7063: tests for untracked cache

 .gitignore                                 |   1 +
 Makefile                                   |   1 +
 builtin/commit.c                           |   8 +
 builtin/update-index.c                     | 161 ++++++
 cache.h                                    |   5 +
 dir.c                                      | 853 +++++++++++++++++++++++++++--
 dir.h                                      | 120 +++-
 read-cache.c                               |  51 +-
 t/t7063-status-untracked-cache.sh (new +x) | 352 ++++++++++++
 test-dump-untracked-cache.c (new)          |  61 +++
 unpack-trees.c                             |   7 +-
 wt-status.c                                |   6 +
 12 files changed, 1537 insertions(+), 89 deletions(-)
 create mode 100755 t/t7063-status-untracked-cache.sh
 create mode 100644 test-dump-untracked-cache.c

-- 
1.9.1.346.ga2b5940

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]