Re: Git messes up 'ø' character

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 20, 2015 at 10:38 PM, Noralf Trønnes <notro@xxxxxxxxxxx> wrote:
> Den 20.01.2015 22:26, skrev Ævar Arnfjörð Bjarmason:
>
>> On Tue, Jan 20, 2015 at 10:23 PM, Noralf Trønnes <notro@xxxxxxxxxxx>
>> wrote:
>>>
>>> Den 20.01.2015 21:45, skrev Ævar Arnfjörð Bjarmason:
>>>
>>>> On Tue, Jan 20, 2015 at 9:17 PM, Noralf Trønnes <notro@xxxxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> Den 20.01.2015 21:07, skrev Torsten Bögershausen:
>>>>>>
>>>>>> On 2015-01-20 20.46, Noralf Trønnes wrote:
>>>>>> could it be that your "ø" is not encoded as UTF-8,
>>>>>> but in ISO-8859-15 (or so)
>>>>>>
>>>>>>> $ git log -1
>>>>>>> commit b2a4f6abdb097c4dc092b56995a2af8e42fbea79
>>>>>>> Author: Noralf Tr<F8>nnes <notro@xxxxxxxxxxx>
>>>>>>
>>>>>> What does
>>>>>> git config -l | grep Noralf | xxd
>>>>>> say ?
>>>>>>
>>>>> $ git config -l | grep Noralf | xxd
>>>>> 0000000: 7573 6572 2e6e 616d 653d 4e6f 7261 6c66  user.name=Noralf
>>>>> 0000010: 2054 72f8 6e6e 6573 0a                    Tr.nnes.
>>>>>
>>>>> $ file ~/.gitconfig
>>>>> /home/pi/.gitconfig: ISO-8859 text
>>>>
>>>> What's happened here is that:
>>>>
>>>>    1. You've authored your commit in ISO-8859-1
>>>>    2. Git itself has no place for the encoding of the author name in the
>>>> commit object format
>>>>    3. git-format-patch has a --compose-encoding which I think would sort
>>>> this out if you set it to ISO-8859-1, but it defaults to UTF-8
>>>>    4. Your patch is actually a ISO-8859-1 byte sequence, but is
>>>> advertised as UTF-8
>>>>    5. You end up with a screwed-up commit
>>>>
>>>> You could work around this, but I suggest just joining the 21st
>>>> century and working exclusively in UTF-8, it makes things much easier,
>>>> speaking as someone with 3x more non-ASCII characters their his name
>>>> than you :)
>>>>
>>> Ok, then the question is: How do I switch to UTF-8?
>>>
>>> To me it seems I'm already using it:
>>> $ locale charmap
>>> UTF-8
>>
>> Your .gitconfig has an ISO-8859-1 string, from an earlier mail of yours:
>>
>>> $ git config -l | grep Noralf | xxd
>>> 0000000: 7573 6572 2e6e 616d 653d 4e6f 7261 6c66  user.name=Noralf
>>> 0000010: 2054 72f8 6e6e 6573 0a                    Tr.nnes.
>>
>> On a system configured for UTF-8 this would be:
>>
>> $ echo Noralf Trønnes | xxd
>> 0000000: 4e6f 7261 6c66 2054 72c3 b86e 6e65 730a  Noralf Tr..nnes.
>>
>> Note the "f8" v.s. "c3 b8".
>>
>
> Yes:
> $ echo Noralf Trønnes | xxd
> 0000000: 4e6f 7261 6c66 2054 72f8 6e6e 6573 0a    Noralf Tr.nnes.
>
> Is there a command I can run that shows that I'm using ISO-8859-1 ?
> I need something to google with, my previous search only gave locale stuff,
> which seems fine.

What does this give you, this is UTF-8.

$ echo git commit --author="Noralf Trønnes <notro@xxxxxxxxxxx>" | xxd
0000000: 6769 7420 636f 6d6d 6974 202d 2d61 7574  git commit --aut
0000010: 686f 723d 4e6f 7261 6c66 2054 72c3 b86e  hor=Noralf Tr..n
0000020: 6e65 7320 3c6e 6f74 726f 4074 726f 6e6e  nes <notro@tronn
0000030: 6573 2e6f 7267 3e0a                      es.org>.

To see if you're using UTF-8 just look at the codepoints for the
non-ASCII characters you're using and check if they're valid UTF-8.
E.g. you can check this out:
http://en.wikipedia.org/wiki/%C3%98#Computers

Which shows you that the UTF-8 hex version is C3 B8, but the Latin-1
is F8, you're emitting F8, I'm emitting C3 B8.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]