On Thu, 20 Dec 2007, Linus Torvalds wrote: > > It only happened for a few files that had lots of repeated lines - so that > the diff could literally be done multiple different ways - and in fact, > the file that caused the problems really had a bogus commit that > duplicated *way* too much data, and caused lots of #define's to exist > twice. Here's the example of this kind of behaviour: in the 2.6.26-rc5 tree the file drivers/video/mbx/reg_bits.h has the #defines for /* DINTRS - Display Interrupt Status Register */ /* DINTRE - Display Interrupt Enable Register */ duplicated twice due to commit ba282daa919f89c871780f344a71e5403a70b634 ("mbxfb: Improvements and new features") by Raphael Assenat mistakenly adding another copy of the same old set of defines that we already got added once before by commit fb137d5b7f2301f2717944322bba38039083c431 ("mbxfb: Add more registers bits access macros"). Now, that was a mistake - and one that probably happened because Rafael or more likely Andrew Morton used GNU patch with its insane defaults (which is to happily apply the same patch that adds things twice, because it doesn't really care if the context matches or not). But what that kind of thing causes is that when you create a patch of the end result, it can show the now new duplicate lines two different (but equally valid) ways: it can show it as an addition of the _first_ set of lines, or it can show it as an addition of the _second_ set of lines. They are the same, after all. Now, it doesn't really matter which way you choose to show it, although because of how "git diff" finds similarities, it tends to prefer to show the second set of identical lines as the "new" ones. Which is generally reasonable. However, that interacted really badly with the new git logic that said that "if the two files end in the same sequence, just ignore the common tail of the file", because the latter copy of the identical lines would now show up as _part_ of that common tail, so the lines that the git diff machinery would normally like to show up as "new" did in fact end up being considered uninteresting, because they were part of an idential tail. So now "git diff" would happily pick _earlier_ lines as the new ones, and it would still be a conceptually valid diff, but because we had trimmed the tail of the file, that conceptually valid diff no longer had the expected shared context at the end. And while it's a bit embarrassing, I'm really rather happy that both GNU patch and "git apply" actually refused to apply the patch. It may have been "conceptually correct" (ie it did really contain all of the changes!) but because it lacked the expected context it really wasn't a good patch. That was a rather long-winded explanation of what happened, mainly because it was all very unexpected to me, and I had personally mistakenly thought the git optimization was perfectly valid and actually had to go through the end result to see what was going on. Anyway, the diff on kernel.org should be all ok now, and mirrored out too. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html