Steven Grimm <koreth@xxxxxxxxxxxxx> wrote: > Junio rightly points out that it would be a mistake to discard \r > characters from binary files when computing similarity scores. So now we > only do it if the file contents test as non-binary. > > The file attributes aren't available at this level of the code, but they > could be propagated down from the higher levels if we don't trust > buffer_is_binary() to make an adequately accurate decision. Ick. If we can get the attributes into diff_filespec this is pretty easy, as you can do a crlf->lf conversion on both files if both are considered to be text, but it doesn't look like it would be very easy to get the attributes into the diff_filespec. Actually even better if you can also run the in/out filter things. I'm thinking of say an XML file that has had whitespace formatting changes, but whose XSD and processors ignore unnecessary whitespace. Be nice if the rename detection actually was able to canonicalize both files before detecting the rename, assuming both files had a canonicalizer input filter defined that does that... Of course diff.c defines a nice diff_is_binary() at file scope that does at least a "can we diff this" decision. Might be good if that could be reused for the rename detection. OK, that's far more than I actually know about diffcore. This is one for Junio, Linus, you, and those who are less tired than I feel right now... ;-) Personally I'd rather see us doing the right thing (use attributes and fallback on guessing if no preference is stated either way) over doing something half-a**ed (only guessing). -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html