Re: [PATCH v1] Fix bugs preventing adding updated cache entries to the name hash

Ben Peart <peartben@xxxxxxxxx> · Thu, 15 Mar 2018 14:44:06 -0400

On 3/15/2018 1:58 PM, Junio C Hamano wrote:
Ben Peart <benpeart@xxxxxxxxxxxxx> writes:

Update replace_index_entry() to clear the CE_HASHED flag from the new cache
entry so that it can add it to the name hash in set_index_entry()

OK.

diff --git a/read-cache.c b/read-cache.c
index 977921d90c..bdfa552861 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -62,6 +62,7 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
  	replace_index_entry_in_base(istate, old, ce);
  	remove_name_hash(istate, old);
  	free(old);
+	ce->ce_flags &= ~CE_HASHED;
  	set_index_entry(istate, nr, ce);
  	ce->ce_flags |= CE_UPDATE_IN_BASE;
  	mark_fsmonitor_invalid(istate, ce);

As we are removing "old" that is not "ce", an earlier call to
remove_name_hash() that clears the CE_HASHED bit from the cache
entry does not help us at all.  We need to clear the bit from "ce"
ourselves before calling set_index_entry() on it, otherwise the call
would become a no-op wrt the name hash.  Makes sense.


Correct.  As you note below, this one line is sufficient to fix the 
actual bug.

Makes me wonder why "ce" which is a replacement for what is in the
index already has the hashed bit, though.  Is that the failure to
use copy_cache_entry() in the caller the other part of this patch
fixes?  To me it looks like copy_cache_entry() is designed for
copying an entry's data to another one that has a different name,
but in the refresh codepath, we _know_ we are replacing an old entry
with an entry with the same name, so it somehow feels a bit strange
to use copy_cache_entry(), instead of doing memcpy() (and possibly
dropping the HASHED bit from the new copy--but wouldn't that become
unnecessary with the fix to replace_index_entry() we saw above?)


This 2nd part of the patch was more for code cleanliness.  When I was 
investigating why the hashed bit was set, it was caused by this 
memcpy().  When I examined the rest of the code base, I only found 1 
other instance (in dup_entry()) that did a straight memcpy(), the rest 
used the copy_cache_entry() macro.  I updated this code to match that 
pattern as it would have prevented the bug as well though as you 
correctly point out, it is not necessary with the other fix.

Is this fix something we can demonstrate in a new test, by the way?


Unfortunately I was unable to find a way to reliably demonstrate this 
bug and fix with a new test.  I only ran across it while working on 
another patch series that ends up triggering it more reliably.

The symptom was that occasionally in very specific circumstances a git 
status call would incorrectly show some directories as having untracked 
files when there were actually no untracked files in that directory.  A 
subsequent call to status would correctly show no untracked files as 
would git status -uall.

I do have a test for this fix as part of the other patch series but 
thought I'd submit this bug fix separately as it may be a while before 
the other patch series is ready.

Thanks.