Re: Some rough edges of core.fsmonitor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 1/27/2018 2:01 PM, Ævar Arnfjörð Bjarmason wrote:

On Sat, Jan 27 2018, Duy Nguyen jotted:

On Sat, Jan 27, 2018 at 07:39:27PM +0700, Duy Nguyen wrote:
On Sat, Jan 27, 2018 at 6:43 PM, Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
a) no fsmonitor

     $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
     12:32:44.947651 read-cache.c:1890       performance: 0.053153609 s: read cache .git/index
     12:32:44.967943 preload-index.c:112     performance: 0.020161093 s: preload index
     12:32:44.974217 read-cache.c:1446       performance: 0.006230611 s: refresh index

...

b) with fsmonitor

     $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
     12:34:23.833625 read-cache.c:1890       performance: 0.049485685 s: read cache .git/index
     12:34:23.838622 preload-index.c:112     performance: 0.001221197 s: preload index
     12:34:23.858723 fsmonitor.c:170         performance: 0.020059647 s: fsmonitor process '.git/hooks/fsmonitor-watchman'
     12:34:23.871532 read-cache.c:1446       performance: 0.032870818 s: refresh index

Hmm.. why does refresh take longer with fsmonitor/watchman? With the
help from watchman, we know what files are modified. We don't need
manual stat()'ing and this line should be lower than the "no
fsmonitor" case, which is 0.006230611s.

Ahh.. my patch probably does not see that fsmonitor could be activated
lazily inside refresh_index() call. The patch below should fix it.

Will have to get those numbers to you later, or alternatively clone
https://github.com/avar/2015-04-03-1M-git (or some other test repo) and
test it yourself, sorry. Don't have time to follow-up much this weekend.

But between your normal refresh time (0.020 preload + 0.006 actual
refresh) and fsmonitor taking 0.020 just to talk to watchman, this
repo seems "too small" for fsmonitor/watchman to shine.

Surely that's an implementation limitation and not something inherent,
given that watchman itself returns in 5ms?

I.e. status could work like this, no?:

  1. At start, record the timestamp & find out canonical state via some
     expansive method.
  2. Print out xyz changed, abc added etc.
  3. Record *just* what status would report about xyz, abc etc.
  4. On subsequent git status, just amend that information, e.g. if
     watchman says nothing changed $(cat .git/last-status-output).

We shouldn't need to be reading the entire index in the common case
where just a few things change.


I agree that reading the entire index in the common case is rather expensive. It is, however, the model we have today and all the code in git assumes all cache entries are in memory.

We are interested in pursuing a patch series that would enable higher performance especially with large and/or sparse repos by making the index sparse, hierarchical, and incrementally readable/writable. As you might expect, that is a lot of work and is far beyond what we can address in this patch series.

There's also a lot of things that use status to just check "are we
clean?", those would only need to record the last known timestamp when
the tree was clean, and then ask watchman if there were any changes, if
not we're done.

I'm still a bit curious that refresh index time, after excluding 0.020
for fsmonitor, is stil 0.012s. What does it do? It should really be
doing nothing. Either way, read index time seems to be the elephant in
the room now.

-- 8< --
diff --git a/read-cache.c b/read-cache.c
index eac74bc9f1..d60e0a8480 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1367,12 +1367,21 @@ int refresh_index(struct index_state *istate, unsigned int flags,
  	unsigned int options = (CE_MATCH_REFRESH |
  				(really ? CE_MATCH_IGNORE_VALID : 0) |
  				(not_new ? CE_MATCH_IGNORE_MISSING : 0));
+	int ignore_fsmonitor = options & CE_MATCH_IGNORE_FSMONITOR;
  	const char *modified_fmt;
  	const char *deleted_fmt;
  	const char *typechange_fmt;
  	const char *added_fmt;
  	const char *unmerged_fmt;
-	uint64_t start = getnanotime();
+	uint64_t start;
+
+	/*
+	 * If fsmonitor is used, force its communication early to
+	 * accurately measure how long this function takes without it.
+	 */
+	if (!ignore_fsmonitor)
+		refresh_fsmonitor(istate);
+	start = getnanotime();

  	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
  	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
-- 8< --



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux