Re: [PATCH v7 03/12] update-index: add a new --force-write-index option

Ben Peart <peartben@xxxxxxxxx> · Wed, 20 Sep 2017 10:58:11 -0400






On 9/20/2017 1:47 AM, Junio C Hamano wrote:
Ben Peart <benpeart@xxxxxxxxxxxxx> writes:

+		OPT_SET_INT(0, "force-write-index", &force_write,
+			N_("write out the index even if is not flagged as changed"), 1),

Hmph.  The only time this makes difference is when the code forgets
to mark active_cache_changed even when it actually made a change to
the index, no?  I do understand the wish to be able to observe what
_would_ be written if such a bug did not exist in order to debug the
other aspects of the change in this series, but at the same time I
fear that we may end up sweeping the problem under the rug by
running the tests with this option.


This is to enable a performance optimization I discovered while perf 
testing the patch series.  It enables us to do a lazy index write for 
fsmonitor detected changes but still always generate correct results.

Lets see how my ascii art skills do at describing this:

1) Index marked dirty on every fsmonitor change:
A---x---B---y---C

2) Index *not* marked dirty on fsmonitor changes:
A---x---B---x,y---C

Assume the index is written and up-to-date at point A.

In scenario #1 above, the index is marked fsmonitor dirty every time the 
fsmonitor detects a file that has been modified.  At point B, the 
fsmonitor integration script returns that file 'x' has been modified 
since A, the index is marked dirty and then written to disk with a 
last_update time of B.  At point C, the script returns 'y' as the 
changes since point B, the index is marked dirty and written to disk again.

In scenario #2, the index is *not* marked fsmonitor dirty when changed 
are detected.  At point B, the script returns 'x' but the index is not 
flagged dirty nor written to disk.  At point C, the script will return 
'x' and 'y' (since both have been changed since time 'A') and again the 
index is not marked dirty nor written to disk.

Correct results are generated in both scenarios but in scenario 2, there 
were 2 fewer index writes.  In short, the changed files were accumulated 
as the cost of processing 2 files at point C (vs 1) has no measurable 
difference in perf but the savings of two unnecessary index writes is 
significant (especially when the index gets large).

There is no real concern about accumulating too many changes as 1) the 
processing cost for additional modified files is fairly trivial and 2) 
the index ends up getting written out pretty frequently anyway as files 
are added/removed/staged/etc which updates the fsmonitor_last_update time.

The challenge came when it was time to test that the changes to the 
index were correct.  Since they are lazily written by default, I needed 
a way to force the write so that I could verify the index on disk was 
correct.  Hence, this patch.


  		OPT_END()
  	};
  
@@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
  		die("BUG: bad untracked_cache value: %d", untracked_cache);
  	}
  
-	if (active_cache_changed) {
+	if (active_cache_changed || force_write) {
  		if (newfd < 0) {
  			if (refresh_args.flags & REFRESH_QUIET)
  				exit(128);