Re: An interaction with ce_match_stat_basic() and autocrlf

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Tue, 8 Jan 2008, Junio C Hamano wrote:
>> 
>> This is caused partly by the breakage in size_only codepath of
>> diff.c::diff_populate_filespec().
>
> Only partially.

Agreed.  That's why it is "just a half of the story".

> The more fundamental behaviour (that of git update-index) is caused by 
> ie_modified() thinking that when DATA_CHANGED is true, it cannot possibly 
> need to call "ce_modified_check_fs()":
> ...
> Similarly, I think that the problem with "diff" not realizing they might 
> be the same comes from ie_match_stat(), which has a similar problem in not 
> realizing that DATA_CHANGED could possibly still mean that it's the same.

Yes, I think your patch to ie_modified() should take care of the
issue from the diff-files front-end side, which is the right
approach.  The optimization diffcore_populate_filespec() makes
when asked to do size_only, which predates the addition of
convert_to_git(), needs to be updated regardless, though.  The
size field in diffcore_filespec is never about on-filesystem
size.

> This patch should fix it, but I suspect we should think hard about that 
> change to ie_modified(), and see what the performance issues are (ie that 
> code has tried to avoid doing the more expensive ce_modified_check_fs() 
> for a reason).

I think the reason was I simply avoided doing any unnecessary
operation that goes to the filesystem.  We did not even have
that modified_check_fs() code before the racy-git safety, and
when I added it I do not think I benched it with a real-life
workload; the logic there was simply a valid optimization back
then.

It is not anymore.  Addition of convert_to_git() made cached
stat info essentially ineffective in the sense that:

 (1) if a user changes the work tree files in such a way that
     does not change convert_to_git() output, the index will say
     "file contents in external representation has definitely
     changed, the sizes no longer match".  We need to actually
     go to the data to find out that there is no change at the
     canonical level.

 (2) if a user changes the crlf setting (or .gitattributes)
     without touching the work tree files, the index will say
     "unchanged and do not have to compare".  We need to
     actually go to the data to find out that they do not match
     anymore.

The latter is an opposite issue of what I brought up in this
thread.  I personally do not want to "fix" it --- it means
destroying one of the most important optimizations.  The use
case is essentially a one-shot operation for a user to "fix" a
broken crlf setting, and having to re-checkout everything is a
small cost to pay to maintain it.

But the former is something we should be able to deal with
sanely.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux