Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Tue, 8 Jan 2008, Junio C Hamano wrote: >> >> This is caused partly by the breakage in size_only codepath of >> diff.c::diff_populate_filespec(). > > Only partially. Agreed. That's why it is "just a half of the story". > The more fundamental behaviour (that of git update-index) is caused by > ie_modified() thinking that when DATA_CHANGED is true, it cannot possibly > need to call "ce_modified_check_fs()": > ... > Similarly, I think that the problem with "diff" not realizing they might > be the same comes from ie_match_stat(), which has a similar problem in not > realizing that DATA_CHANGED could possibly still mean that it's the same. Yes, I think your patch to ie_modified() should take care of the issue from the diff-files front-end side, which is the right approach. The optimization diffcore_populate_filespec() makes when asked to do size_only, which predates the addition of convert_to_git(), needs to be updated regardless, though. The size field in diffcore_filespec is never about on-filesystem size. > This patch should fix it, but I suspect we should think hard about that > change to ie_modified(), and see what the performance issues are (ie that > code has tried to avoid doing the more expensive ce_modified_check_fs() > for a reason). I think the reason was I simply avoided doing any unnecessary operation that goes to the filesystem. We did not even have that modified_check_fs() code before the racy-git safety, and when I added it I do not think I benched it with a real-life workload; the logic there was simply a valid optimization back then. It is not anymore. Addition of convert_to_git() made cached stat info essentially ineffective in the sense that: (1) if a user changes the work tree files in such a way that does not change convert_to_git() output, the index will say "file contents in external representation has definitely changed, the sizes no longer match". We need to actually go to the data to find out that there is no change at the canonical level. (2) if a user changes the crlf setting (or .gitattributes) without touching the work tree files, the index will say "unchanged and do not have to compare". We need to actually go to the data to find out that they do not match anymore. The latter is an opposite issue of what I brought up in this thread. I personally do not want to "fix" it --- it means destroying one of the most important optimizations. The use case is essentially a one-shot operation for a user to "fix" a broken crlf setting, and having to re-checkout everything is a small cost to pay to maintain it. But the former is something we should be able to deal with sanely. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html