John Keeping <john@xxxxxxxxxxxxx> writes: > The message for commit 6dd4b66 (Fix diffcore-break total breakage) > indicates that "don't bother to break small files" is wrong in some > cases, but it I wonder if "don't bother to break empty files" is okay. This has a rather subtle ramifications, and we would need to think carefully. "break" does two very different things, and the criteria you would want to use to decide to "break" a filepair depends on which one you are interested in. The very original motivation of "break" was to show a patch that rewrites an existing file completely in a way different from the usual "diff -u" output, which will try to minimize the output by finding (rare) lines that happen to be the same between the preimage and postimage, intersparsed in many lines of "-" (deletion) and "+" (addition). Such a change is often easier to understand when shown as a single block of "-" (deletion) of the entire original contents, followed by a single block of "+" (addition) of the entire new contents. A totally separate motivation of "break" is the one Linus talks about in the log message of the said commit. A path filename.h was moved to filename_32.h, and a new (and much smaller) filename.h was introduced, that "#include"s filename_32.h. "diff -M" that pairs deleted files with created files to compute renames in such a case would not consider the original filename.h as a possible source of filename_32.h that was created. You want to break modification of filename.h into (virtual) deletion and addition of filename.h. For the purpose of the former, you would want not to break a file too aggressively. If you started from a file with 1000 lines in it, and deleted 910 lines and added 10 lines to result in a file with 100 lines, you still have 90 lines of shared contents between the preimage and the postimage, and you do not want to show it as "delete 1000 lines and add 100 lines". You would want to base your decision on how much common material exists between the two. For the purpose of the latter, however, it is better to err on the side of breaking than not breaking. After breaking a suspicious modification into addition and deletion, if rename comparison does not find a suitable destination for the virtual deletion, the broken halves will be merged back, so breaking too little can hurt by missing potential renames, but breaking too much will not. You do want to break the "900 deleted out of 1000 and then added 10" case, as that 900 lines may have gone to another file that was created in the same commit. "But we have 90 lines of common material" does not matter for the decision to break. In today's code, the return value of should_break() is about the latter, while the score it stores in *merge_score is about the former. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html