Matthijs Kooijman wrote: > This is somewhat expected, of course, since the --word-diff formats are > documented to show only changes to words, not to non-words/whitespace. > So I guess it is expected that the output is ambigious wrt whitespace, > but if so, what is the use of this porcelain format? Wouldn't it be make > a lot more sense to make the format unambiguous and make it do > word-based diff at the same time? I think this should be possible > because of the explicit notation used for the newline. It's not just the newlines. You will note that when diffing e.g. a x c vs. a b c then because the word-diff engine doesn't bother with the non-word parts, you will end up with (using ordinary --word-diff) a [-x-]{+b+} c or using the porcelain format $ git diff --no-index --word-diff=porcelain a b | cat -E diff --git 1/a 2/b$ index 4f47486..b040ae0 100644$ --- 1/a$ +++ 2/b$ @@ -1 +1 @@$ a $ -x$ +b$ c$ ~$ Note how the second space after 'x' disappeared. The case that Wincent explains in the email he linked (thanks; I didn't even see it the first time around) shows a more drastic example. So the logic that inserts the space again is too simplistic. What it does not is something like (haven't checked again) always insert the postimage space that went with the preceding word. What it *would* have to do is actually compute differences in the space, while at the same time ignoring them for the purposes of the LCS algorithm. If it weren't for the customizable word-regex, it might be enough to put the spaces in the words again but enable the internal equivalent of -b. > Looking at the format itself, it's a bit unclear to me what the ~ lines > mean exactly. Commit 882749, which introduced the format says the mean > "newlines in the input", but I'm not sure if this means the old file, > new file or both. As Matthieu already said, the history of --word-diff=porcelain is just: I needed a way to pipe word-diff output to gitk for coloring, without any messing around with the color strings. The ~ is the minimal required feature to at least let git communicate *some* linebreaks. You are probably right in that while the format leaves open the possibility of showing addition and removal of space (although the current engine does not really attempt to compute it), I did not think far enough to let it show addition and removal of whitespace. > For example, Specifying the ~ lines to mean a newline in the old, new or > both files depending on the previous +, - or space prefixed line is > probably enough for this. By generating empty +, - or space prefixed > lines when needed, every occurence of ~ could be disambiguated. AFAICS that breaks down if you couple a newline-change-aware consumer with a non-aware producer (i.e., a new frontend with an old git), since there is currently no guarantee that the line before a ~ is context. So you would have to introduce a new format (porcelain2?) anyway, and if you're already going that route, then for simplicity I'd rather have something like ~+ (or so) instead of requiring the parser to track state. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html