Mike Crowe <mac@xxxxxxxxxx> writes: > I think that if there's a risk that file contents will undergo conversion > then this should force the diff to check the file contents. Something like: > > diff --git a/diff.c b/diff.c > index 051761b..bee1662 100644 > --- a/diff.c > +++ b/diff.c > @@ -3413,13 +3413,14 @@ void diff_setup_done(struct diff_options *options) > /* > * Most of the time we can say "there are changes" > * only by checking if there are changed paths, but > - * --ignore-whitespace* options force us to look > - * inside contents. > + * --ignore-whitespace* options or text conversion > + * force us to look inside contents. > */ > > if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) || > DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) || > - DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL)) > + DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) || > + DIFF_OPT_TST(options, ALLOW_TEXTCONV)) > DIFF_OPT_SET(options, DIFF_FROM_CONTENTS); > else > DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS); Thanks. You may be on the right track to find FROM_CONTENTS bit, but I think TEXTCONV bit is a red-herring. This part of diff.c caught my attention, while thinking about this topic: if (output_format & DIFF_FORMAT_NO_OUTPUT && DIFF_OPT_TST(options, EXIT_WITH_STATUS) && DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) { /* * run diff_flush_patch for the exit status. setting * options->file to /dev/null should be safe, because we * aren't supposed to produce any output anyway. */ and the body of this "if" statement loops over q->queue[]. It is about the "even though we prefer not having to format the patch because we are doing --quiet, we need to see if contents of one and two that we _know_ are different are made into the same thing when we compare them while ignoring various forms of whitespace changes". So one and two that are removed in earlier step because they were truly identical may not be penalized if you flip FROM_CONTENTS bit early on. Having said that, DIFF_FROM_CONTENTS is about all paths this options structure governs, not just paths that have eol conversion defined. When you say "diff --ignore-whitespace-change", that applies to all paths. The eol conversion is specified per-path, so ideally the FROM_CONTENTS bit should be flipped if and only if one or more of the paths would need the conversion (i.e. does the helper function would_convert_to_git() say "yes" to the path?). I however suspect that necessary information to do so (i.e. "which paths we are looking at?") has not been generated yet at the point of the code you quoted. setup comes (and must come) very early, and then q->queue[] is populated by different front-end functions that compare trees, the index, and the working tree, depending on the "git diff" option or "git diff-{tree,index,files}" plumbing command, and you can ask "would one of these paths need conversion?" only after q->queue[] is populated. Hmm..... Another thing is that ALLOW_TEXTCONV is not a right bit to check for your example. It is "do we use textconv filters to turn binary files into a (phony) text representation before comparing?". People use the mechanism to throw JPEG photos in Git and have textconv filter to extract only EXIF data, and "diff --textconv" would let us textually compare only the EXIF data (which is the only human readable part of the contents anyway). It might be a good idea to also flip FROM_CONTENTS bit under "diff --textconv", and ... > This solves the problem for me and my test case now passes. ... but I suspect that you were misled to think it fixes your issue, only because "--textconv" is by default enabled for "git diff" and "git log" (see "git diff --help"). I think you are saying that if you always set FROM_CONTENTS bit on, you get what you want. But that is to be expected and it unfortunately penalizes everybody by turning an obvious optimization permanently off. Also "--textconv" is not on by default for the plumbing "git diff-index" command and its friends, so it is likely that "git diff-index HEAD" with your change will still not work as you expect. A cheap (from code-change point of view) band-aid might be to flip FROM_CONTENTS on if we know the repository has _some_ paths that need eol conversion, even when the particular "diff" we are taking does not involve any eol conversion (e.g. "is core.crlf set?"). While it may be better than "if we are porcelain (aka ALLOW_TEXTCONV is set), unconditionally flip FROM_CONTENTS on", it is not ideal, either. This almost makes me suspect that the place that checks lengths of one and two in order to refrain from running more expensive content comparison you found earlier need to ask would_convert_to_git() before taking the short-cut, something along the lines of this (for illustration purposes only, not even compile-tested). The "almost" comes to me because I do not offhand know the performance implications of making calls to would_convert_to_git() here. diff.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/diff.c b/diff.c index 051761be40..094d5913da 100644 --- a/diff.c +++ b/diff.c @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p) * differences. * * 2. At this point, the file is known to be modified, - * with the same mode and size, and the object - * name of one side is unknown. Need to inspect - * the identical contents. + * with the same mode and size, the object + * name of one side is unknown, or size comparison + * cannot be depended upon. Need to inspect the + * contents. */ if (!DIFF_FILE_VALID(p->one) || /* (1) */ !DIFF_FILE_VALID(p->two) || @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p) (p->one->mode != p->two->mode) || diff_populate_filespec(p->one, CHECK_SIZE_ONLY) || diff_populate_filespec(p->two, CHECK_SIZE_ONLY) || - (p->one->size != p->two->size) || + + /* + * only if eol and other conversions are not involved, + * we can say that two contents of different sizes + * cannot be the same without checking their contents. + */ + (!would_convert_to_git(p->one->path) && + !would_convert_to_git(p->two->path) && + (p->one->size != p->two->size)) || + !diff_filespec_is_identical(p->one, p->two)) /* (2) */ p->skip_stat_unmatch_result = 1; return p->skip_stat_unmatch_result;