Thanks Johannes for the review. This patch hopefully fixes the problems you mentioned: - Hexadecimals, binary and numbers with `_` are considered a single token Pre Post 0xFF_EC_DE_5E 0xFF_E1_DE_5E 0b100_000 0b100_100 100_000 200_000 Even though a single character is changed in each of the above numbers, the diffs would be produced as if they were single tokens - More tests added for "proper" multicharacter operators. Earlier regex would consider a++!=++b as 3 different tokens(a, ++!=++, b) This patch matches the tokens properly into (a, ++, !=, ++, b)