Re: [bug] git diff --word-diff gives wrong result for utf-8 chinese

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ping

On 01/12/2022 07:33, Ping Yin wrote:
If the rule is "break on ascii whitespace",

Is there a way to achieve this: break english by word, and break
chinese by utf-8 character

You could extend your current regex so that it matches whole utf-8 codepoints which is what git does for the builtin userdiff regexes. I've not tested it but I think

git config --global diff.wordregex "[[:alnum:]_]+|[^[:space:]]|$(printf '[\xc0-\xff][\x80-\xbf]+')"

should work. The downside is that you end up with a .gitconfig that is not valid utf-8. Perhaps someone else has a clever idea to get around that.

Best Wishes

Phillip



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux