Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> One solution would be to force conversion to UTF-8 on input via "open"
> pragma (e.g. "use open ':encoding(UTF-8)';").  But there is no
> UTF-8-with_fallback encoding available - we would have to write one, and
> install it as module (or fake it via Perl trickery).  This mechanism is
> almost the same to what we currently use in gitwbe.

Yes, I tried using `Encode::Guess` with "open" pragma, but no luck.
https://perldoc.perl.org/Encode/Guess.html

I'm also afraid of "open" pragma does not work properly while using
git_blame_common().  Let's say someone using non-ASCII characters in
his/her name, committing non-UTF8 encoded characters.  git-blame will
combine them in the same line.  Following is an example:

$ git blame dummy | xxd
00000000: 3461 6464 3565 6331 2028 e585 90e5 b3b6  4add5ec1 (......
00000010: 20e6 96b0 2032 3031 382d 3035 2d30 3320   ... 2018-05-03
00000020: 3232 3a34 383a 3432 202b 3039 3030 2031  22:48:42 +0900 1
00000030: 2920 8367 8389 8343 0a                   ) .g...C.

    * e585 90e5 b3b6 20e6 96b0 : my name, encoded with UTF-8
    * 8367 8389 8343           : "トライ" encoded with Shift_JIS

It means I need to split each lines of git-blame output at the very
beginning, then convert the first-half as UTF-8 and the second-half as
Shift_JIS.

Sincerely,

-- 
Shin Kojima



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux