On Thu, Jun 18, 2015 at 09:49:19PM -0700, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > ... I think you could also argue that > > because of whitespace-highlighting, colorized diffs are fundamentally > > going to have colors intermingled with the content and should not be > > parsed this way. > > Painting of whitespace breakages is asymmetric [*1*]. If you change > something on a badly indented line without fixing the indentation, > e.g. > > -<SP><TAB>hello-world > +<SP><TAB>hello-people > > only the space-before-tab on the latter line is painted. > > For the purpose of your diff highlighting, however, you would want > to treat the leading "<SP><TAB>hello-" on preimage and postimage > lines as unchanged. I do strip it off, so it is OK for it to be different in both the pre-image and post-image. But what I can't tolerate is the intermingling with actual data: +\t\t\x1b[32m;foo +\t\x1b[32m;bar Those are both post-image lines. I can strip off the "+" from each side to compare their inner parts to the pre-image. But the intermingled color gets in my way. I can simply strip all colors from the whole line, but then information is lost; how do I know where to put them back again? It is not just "add back the color at the beginning" (which is what I do with the prefix). I think the answer is that I must strip them out, retaining the colors found in each line along with their offset into the line, and then as I write out the lines, re-add them at the appropriate spots (taking care to use the original offsets, not the ones with the highlighting added in). > > All the more reason to try to move this inside diff.c, I guess. > > That too, probably. Hmm, I thought that would solve all my problems by operating on the pre-color version without much more work. But... > If we were to do this, I think it may make sense to separate the > logic to compute which span of the string need to be painted in what > color and the routine to actually emit the colored output. That is, > instead of letting ws-check-emit to decide which part should be in > what color _and_ emitting the result, we should have a helper that > reads <line, len>, and give us an array of spans to flag as > whitespace violation. Then an optional diff-highlight code would > scan the same <line, len> (or a collection of <line, len>) without > getting confused by the whitespace errors and annotate the spans to > be highlighted. After all that happens, the output routine would > coalesce the spans and produce colored output. > > Or something like that ;-) I think this "array of spans" is the only way to go. Otherwise whichever markup scheme processes the hunk first ruins the data for the next processor. So it is a lot more work to make the two work together. The --word-diff code would have the same issue, except that I imagine it just skips whitespace-highlighting altogether. The least-work thing may actually be teaching the separate diff-highlight script to strip out the colorizing and re-add it by offset. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html