Difficulty with parsing colorized diff output

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, I have a rather elaborate diff highlighter that I have implemented as a post-processor to regular git output. I am writing to discuss some difficult aspects of git diff's color output that I am observing with version 2.19.2. This is not a regression report; I am trying to implement a new feature and am stymied by these details.

My goal is to detect SGR color sequences, e.g. '\x1b[32m', that exist in the source text, and have my highlighter print escaped representations of those. For example, I have checked in files that are expected test outputs for tools that emit color codes, and diffs of those get very confusing.

Figuring out which color codes are from the source text and which were added by git is proving very difficult. The obvious solution is to turn git diff coloring off, but as far as I can see this also turns off all coloring for logs, which is undesirable.

Then I tried to remove just the color codes that git adds to the diff. This almost works, but there are some irregularities. Most lines begin with a style/color code and end with a reset code, which would be a perfect indicator that git is using colors. However:

* Context lines do not begin with reset code, but do end with a reset code. It would be preferable in my opinion if they had both (like every other line), or none at all.

* Added lines have excess codes after the plus sign. The entire prefix is, `\x1b[32m+\x1b[m\x1b[32m` translating to GREEN PLUS RESET GREEN. Emitting codes after the plus sign makes the parsing more complex and idiosyncratic.


In summary, I would like to suggest the following improvements:

* Remove the excess codes after the plus sign.

* When git diff is adding colors, ensure that every line begins with an SGR code and ends with the RESET code.

* Add a config feature to turn on log coloring while leaving diff coloring off.


I would be willing to attempt a fix for this myself, but I'd like to hear what the maintainers think first, and would appreciate any hints as to where I should start looking in the code base.


If anyone is curious about the implementation it is called `same-same` and lives here: https://github.com/gwk/pithy/blob/master/pithy/bin/same_same.py

I configure it like this in .gitconfig:

[core]
  pager = same-same | LESSANSIENDCHARS=mK less --RAW-CONTROL-CHARS
[interactive]
  diffFilter = same-same -interactive | LESSANSIENDCHARS=mK less --RAW-CONTROL-CHARS


Thank you,
George





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux