Re: [PATCH v7 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 9/19/2017 10:28 PM, Junio C Hamano wrote:
Ben Peart <benpeart@xxxxxxxxxxxxx> writes:

+/* do stat comparison even if CE_FSMONITOR_VALID is true */
+#define CE_MATCH_IGNORE_FSMONITOR 0X20

Hmm, when should a programmer use this flag?


Pretty much the same places you would also use CE_MATCH_IGNORE_VALID and CE_MATCH_IGNORE_SKIP_WORKTREE which serve the same role for those features. That is generally when you are about to overwrite data so want to be *really* sure you have what you think you have.

The other place I used it was in preload_index(). In that case, I didn't want to trigger the call to refresh_fsmonitor() as preload_index() is about trying to do a fast precompute of state for the bulk of the index entries but is not required for correctness as refresh_cache_ent() will ensure any "missed" by preload_index() are up-to-date if/when that is needed.

+int git_config_get_fsmonitor(void)
+{
+	if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor))
+		core_fsmonitor = getenv("GIT_FSMONITOR_TEST");

Will the environment be part of the published API, or is it a
remnant of a useful tool for debugging while developing the feature?

If it is the former (and I'd say why not, even though "git -c
core.fsmontor=..." may be easy enough), we might want to rename it
to replace _TEST with _PROGRAM or something and document it.


This was added this to facilitate testing. That is why it has the magic naming of "GIT_***_TEST" which is the only way I found to ensure that env variable gets passed to tests. Its use is discussed in patch 10 which contains the tests.

diff --git a/diff-lib.c b/diff-lib.c
index 2a52b07954..23c6d03ca9 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -12,6 +12,7 @@
  #include "refs.h"
  #include "submodule.h"
  #include "dir.h"
+#include "fsmonitor.h"
/*
   * diff-files
@@ -228,6 +229,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
if (!changed && !dirty_submodule) {
  			ce_mark_uptodate(ce);
+			mark_fsmonitor_valid(ce);

I notice all calls to mark_fsmonitor_valid(ce) always follow a call
to ce_mark_uptodate(ce).  Should the call to the former be made as
part of the latter instead?  Are there cases where we want to call
ce_mark_uptodate(ce) to mark the ce up-to-date, yet we do not want
to call mark_fsmonitor_valid(ce)?  How does a programmer tell when
s/he wants to call ce_mark_uptodate(ce) if s/he also should call
mark_fsmonitor_valid(ce)?

mark_fsmonitor_valid(ce) is the way to indicate that cache entries that were once fsmonitor dirty are now properly reflected in the index so can come off the "dirty" list. It can't really be combined with ce_mark_uptodate(ce) as that would prevent the CE_MATCH_IGNORE_FSMONITOR logic:

	if (!ignore_skip_worktree && ce_skip_worktree(ce)) {
		ce_mark_uptodate(ce);
		return ce;
	}
	if (!ignore_valid && (ce->ce_flags & CE_VALID)) {
		ce_mark_uptodate(ce);
		return ce;
	}
	if (!ignore_fsmonitor && (ce->ce_flags & CE_FSMONITOR_VALID)) {
		ce_mark_uptodate(ce);
		return ce;
	}

In addition, fsmonitor is an optional feature and so the mark_fsmonitor_valid(ce) call should only happen when the feature is turned on. I tried to keep it as simple as possible by making that test and set logic part of the function but still be performant by making the function inline.


Together with "when to pass IGNORE_FSMONITOR" question, is there a
summary that guides new programmers to answer these questions in the
new documentation?


Only the discussion in this mail thread. I could add something to the function header in fsmonitor.h if that would help. How about something like:

diff --git a/fsmonitor.h b/fsmonitor.h
index c2240b811a..03bf3efe61 100644
--- a/fsmonitor.h
+++ b/fsmonitor.h
@@ -34,9 +34,11 @@ extern void tweak_fsmonitor(struct index_state *istate);
  */
 extern void refresh_fsmonitor(struct index_state *istate);

-/*
- * Set the given cache entries CE_FSMONITOR_VALID bit.
- */
+/*
+ * Set the given cache entries CE_FSMONITOR_VALID bit. This should be
+ * called any time the cache entry has been updated to reflect the
+ * current state of the file on disk.
+ */
 static inline void mark_fsmonitor_valid(struct cache_entry *ce)
 {
        if (core_fsmonitor) {
@@ -46,8 +48,11 @@ static inline void mark_fsmonitor_valid(struct cache_entry *ce)
 }

 /*
- * Clear the given cache entry's CE_FSMONITOR_VALID bit and invalidate any
- * corresponding untracked cache directory structures.
+ * Clear the given cache entry's CE_FSMONITOR_VALID bit and invalidate
+ * any corresponding untracked cache directory structures. This should
+ * be called any time git creates or modifies a file that should
+ * trigger an lstat() or invalidate the untracked cache for the
+ * corresponding directory
  */
static inline void mark_fsmonitor_invalid(struct index_state *istate, struct cache_entry *ce)
 {


diff --git a/dir.h b/dir.h
index e3717055d1..fab8fc1561 100644
--- a/dir.h
+++ b/dir.h
@@ -139,6 +139,8 @@ struct untracked_cache {
  	int gitignore_invalidated;
  	int dir_invalidated;
  	int dir_opened;
+	/* fsmonitor invalidation data */
+	unsigned int use_fsmonitor : 1;

This makes it look like we will add a bit more fields in later
patches that are about "invalidation" around fsmonitor, but it
appears that such an addition never happens til the end of the
series.  And use_fsmonitor boolean does not seem to be what the
comment says---it just tells us if fsmonitor is in use in the
operation of the untracked cache, no?

I don't have any more planned bit fields. I only needed a single bit so used a bit field just in case someone else comes by later and needs another bit. If you aren't worried about that, we can just make this an int.

Correct. The bit just indicates whether fsmonitor has been used to ensure the cache is current or if it needs to be checked. I have comments to this effect where the flag is used in the code but could move/duplicate them into the header if wished. For example:

/* With fsmonitor, we can trust the untracked cache's valid field. */

and

/* Now mark the untracked cache for fsmonitor usage */


diff --git a/entry.c b/entry.c
index cb291aa88b..5e6794f9fc 100644
--- a/entry.c
+++ b/entry.c
@@ -4,6 +4,7 @@
  #include "streaming.h"
  #include "submodule.h"
  #include "progress.h"
+#include "fsmonitor.h"
static void create_directories(const char *path, int path_len,
  			       const struct checkout *state)
@@ -357,6 +358,7 @@ static int write_entry(struct cache_entry *ce,
  			lstat(ce->name, &st);
  		fill_stat_cache_info(ce, &st);
  		ce->ce_flags |= CE_UPDATE_IN_BASE;
+		mark_fsmonitor_invalid(state->istate, ce);
  		state->istate->cache_changed |= CE_ENTRY_CHANGED;

Similar to "how does mark_fsmonitor_valid() relate to
ce_mark_uptodate()?" question earlier, mark_fsmonitor_invalid()
seems to often appear in pairs with the update to cache_changed.
Are there cases where we need CE_ENTRY_CHANGED bit added to the
index state yet we do not want to call mark_fsmonitor_invalid()?  I
am wondering if there should be a single helper function that lets
callers say "I changed this ce in this istate this way.  Please
update CE_VALID, CE_UPDATE_IN_BASE and CE_FSMONITOR_VALID
accordingly".

That "changed _this way_" is deliberately vague in my question
above, because it is not yet clear to me when mark-valid and
mark-invalid should and should not be called from the series.


Please let me know if my comment/patch above does not address this concern sufficiently.

+	/* a fsmonitor process can return '*' to indicate all entries are invalid */
+	if (query_success && query_result.buf[0] != '/') {
+		/* Mark all entries returned by the monitor as dirty */

The comment talks about '*' and code checks if it is not '/'?  The
else clause of this if/else handles the "invalidate all" case, so
shouldn't we be comparing with '*' instead here?

Ah, fsmonitor-watchman section of the doc tells us to write the hook
in a way to return slash, so the comment above the code is stale and
the comparison with '/' is what we want, I guess.


Correct. Sorry about missing that. Here is a patch that can be squashed in.

diff --git a/fsmonitor.c b/fsmonitor.c
index b8b2d88fe1..7c1540c054 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -176,7 +176,7 @@ void refresh_fsmonitor(struct index_state *istate)
core_fsmonitor, query_success ? "success" : "failure");
        }

- /* a fsmonitor process can return '*' to indicate all entries are invalid */ + /* a fsmonitor process can return '/' to indicate all entries are invalid */
        if (query_success && query_result.buf[0] != '/') {
                /* Mark all entries returned by the monitor as dirty */
                buf = query_result.buf;




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux