Re: [PATCH 1/2] [RFC] add --recode-patch option to git-mailinfo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13:03 Sun 06 Jun     , Junio C Hamano wrote:
> Zhang Le <r0bertz@xxxxxxxxxx> writes:
> 
> > I have a translation project which uses UTF-8 as charset.
> > So the patch must be encoded in UTF-8, not just the commit msg etc.
> > And we use google group as our mailing list.
> >
> > Recently, due to unknown reason, mails saved from gmail are encoded using GB2312.
> > This never happened before. I guess google has did something.
> > But I haven't found how to change this behavior.
> >
> > So I took another way, i.e. add this option to git-mailinfo.
> > I hope this could benefit others as well.
> >
> > Signed-off-by: Zhang Le <r0bertz@xxxxxxxxxx>
> > ---
> >  builtin/mailinfo.c  |    8 +++++++-
> >  man1/git-mailinfo.1 |    7 ++++++-
> 
> Don't patch anything in man?/ as they are autogenerated files and not
> source; patch the source file in Documentation/ directory instead.

Thanks, will do it.

> 
> I take it that you recode from whatever encoding the mail message is in
> (probably stated in "Content-type: ...; charset=xxx" header) to the
> encoding specified with --encoding option (defaulting to UTF-8), but it
> wasn't very clear from the documentation.  We might want to improve 
> the descriptions of both this new option and --encoding option.

That's exactly what this patch's purpose is.
I will try to improve the doc.

> 
> Also it might be useful to find out what that "due to unknown reason" is,
> at least to see if that is what Google did or what the user did.

One of my friend, Yang Xiaoguang, found that google tries to detect the
language of the email and recode it using the native charset.
For Simplified Chinese, it is GB2312.
For Traditional Chinese, it is Big5.

In the test, Yang sent all emails using UTF-8 charset.
He sent those mails to a google group and then checked the "Content-type: ...;
charset=xxx" in gmail.

If the mail is written in Simplified Chinese, the charset became GB2312.
If the mail is written in Traditional Chinese, the charset became Big5.
If the mail is mixed with Simplified and Traditional Chinese, the charset
remains as UTF-8.

-- 
Zhang, Le
Gentoo/Loongson Developer
http://zhangle.is-a-geek.org
0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973

Attachment: pgpcSsEuNHJ25.pgp
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]