Am 28.10.22 um 23:08 schrieb Simeon Krastnikov: > Hello, > > Given an initial file with the contents "not to be", which I then change > to "to be", the output of 'git diff --color-words', is > > notto be > > with the first three letters colored red. To me this seems incorrect as > it implies, or at least misleadingly suggests, that there was no space > between "not" and "to" in the original file. (Even though in that case > the output is actually "nottoto be" with the "notto" in red and "to" in > green.) > > If instead I start with a file with contents "to be", which I then > change to "not to be", then the output is as expected: > > not to be > > (First three letters colored green.) > > Am I correct in seeing this as a bug? If so, any tips on what parts of > diff.c to look at when starting a patch? Well, not really. When you have a file with Line one. Line two. then change it to Line ONE. Line TWO. then --color-words currently prints it as Line one.ONE. Line two.TWO. because it does not print the whitespace after[*] a sequence of deleted words. But if it were printed, we would see Line one. ONE. Line two. TWO. That is considered inferior; hence, it isn't printed. The current algorithm produces sensible output in the vast majority of cases while also being fairly straight-forward. To make it work "better" (for some definition of that word) in the borderline cases, the algorithm would have to be made considerably more sophisticated. [*] It might be whitespace before a sequence of words, but that does not change the gist of the argument. -- Hannes