Re: [PATCH/RFC v2 06/16] t3700: sleep for 1 second, to avoid interfering with the racy code

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 09 Aug 2012 09:51:26 -0700

Thomas Gummerer <t.gummerer@xxxxxxxxx> writes:

> On 08/08, Junio C Hamano wrote:
>> ...
>> Let me ask the same question in a more direct way.  Which part of
>> this test break with your series?
>> 
>>         test_expect_success 'git add --refresh with pathspec' '
>>                 git reset --hard &&
>>                 echo >foo && echo >bar && echo >baz &&
>>                 git add foo bar baz && H=$(git rev-parse :foo) && git rm -f foo &&
>>                 echo "100644 $H 3	foo" | git update-index --index-info &&
>> 	# "sleep 1 &&" in the update here ...
>>                 test-chmtime -60 bar baz &&
>>                 >expect &&
>>                 git add --refresh bar >actual &&
>>                 test_cmp expect actual &&
>> 
>>                 git diff-files --name-only >actual &&
>>                 ! grep bar actual&&
>>                 grep baz actual
>>         '
>> 
>> We prepare an index with bunch of paths, we make "foo" unmerged, we
>> smudge bar and baz stat-dirty, so that "diff-files" would report
>> them, even though their contents match what is recorded in the
>> index.
>
> After getting confused a bit myself, I now think here is the problem.
> The v5 code smudges baz when doing git add --refresh bar.  Therefore
> baz isn't considered stat-dirty by the code, but a racily smudged entry
> and therefore its content gets checked, thus not showing up in
> git diff-files.

So in short, the breakage is the last one among the three choices I
gave you in my message you are responding to.  The user asked to
refresh "bar" so that later diff-files won't report a false change
on it, but "baz" effectively ends up getting refreshed at the same
time and a false change is not reported.

That "breakage" is, from the correctness point of view, not a
breakage.  As the primary purpose of "refreshing" is to support
commands that want to rely on a quick ce_modified() call to tell
files that are modified in the working tree since it was last added
to the index---you refresh once, and then you call such commands
many times without having to worry about having to compare the
contents between the indexed objects and the working tree files.

But from the performance point of view, which is the whole point of
"refresh", the behaviour of the new code is dubious.  If the user is
working in a large working tree (which automatically means large
index, the primary reason we are doing this v5 experiment), the user
often is working in a deep and narrow subdirectory of it, and a path
limited refresh (the test names a specific file "bar", but imagine
it were "." to limit it to the directory the user is working in) may
be a cheap way not to bother even checking outside the area the user
currently is working in.  Also, smudging more entries than necessary
to be checked by ce_modified_check_fs() later at runtime may mean
that it defeats the "refresh once and then compare cheaply many
times" pattern that is employed by existing scripts.

Is the root cause really where the "racily-clean so smudge to tell
later runtime to check contents" bit goes?  I am hoping that the
issue is not coming from the difference between the current code and
your code when they decide to "smudge", what entries they decide to
"smudge" and based on what condition.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html