On 12:23 Mon 29 Nov , Junio C Hamano wrote: > "ZHANG, Le" <r0bertz@xxxxxxxxxx> writes: > > > This is required for my recode-patch patch which needs a seperate patch_charset variable. > > > > Signed-off-by: ZHANG, Le <r0bertz@xxxxxxxxxx> > > --- > > Thanks. > > Please describe what the new parameter means. Is it used to convert the > contents in "line" from "charset" to "target_charset"? Perhaps it is a > good time to rename the function to "convert_to()", its "charset" > parameter to "from_charset", and name the new parameter "to_charset", in > order to reduce confusion? > > It is not just _your_ patch but will help other/later patches, so you may > want to phrase the proposed commit log message a bit differently (and with > a narrower page width, like 68-74 chars per line). Perhaps like... > > mailinfo.c: Allow convert_to_utf8() to specify both src/dst charset > > The convert_to_utf8() function actually converts to whatever charset > "metainfo_charset" variable contains, which is not necessarily UTF-8. > Rename it to convert_to(), and give an extra parameter "to_charset" to > specify what charset to re-encode to. Also rename its "charset" > parameter to "from_charset" to clarify which is which. Thank you for reviewing! And sorry for late response. For this problem only, I'd like to propose a slightly better approach based on your suggestion. Currently, in mailinfo.c, there are 3 calls to convert_to_utf8(). Two of calls specifies from_charset, the third doesn't. For those two calls which already specifies from_charset, there is no need to guess from_charset. So I make two convert functions. One is guess_and_convert_to(), the other is convert_to(). The next version of repository_encoding patch[1] will use convert_to(). [1] http://thread.gmane.org/gmane.comp.version-control.git/162345/focus=162424 Here is the patch: diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c index 71e6262..0f42ff1 100644 --- a/builtin/mailinfo.c +++ b/builtin/mailinfo.c @@ -472,6 +472,20 @@ static struct strbuf *decode_b_segment(const struct strbuf *b_seg) return out; } +static void convert_to(struct strbuf *line, const char *to_charset, const char *from_charset) +{ + char *out; + + if (!strcasecmp(to_charset, from_charset)) + return; + + out = reencode_string(line->buf, to_charset, from_charset); + if (!out) + die("cannot convert from %s to %s", + from_charset, to_charset); + strbuf_attach(line, out, strlen(out), strlen(out)); +} + /* * When there is no known charset, guess. * @@ -483,32 +497,14 @@ static struct strbuf *decode_b_segment(const struct strbuf *b_seg) * Otherwise, we default to assuming it is Latin1 for historical * reasons. */ -static const char *guess_charset(const struct strbuf *line, const char *target_charset) +static void guess_and_convert_to(struct strbuf *line, const char *to_charset) { - if (is_encoding_utf8(target_charset)) { + if (is_encoding_utf8(to_charset)) { if (is_utf8(line->buf)) - return NULL; - } - return "ISO8859-1"; -} - -static void convert_to_utf8(struct strbuf *line, const char *charset) -{ - char *out; - - if (!charset || !*charset) { - charset = guess_charset(line, metainfo_charset); - if (!charset) return; } - if (!strcasecmp(metainfo_charset, charset)) - return; - out = reencode_string(line->buf, metainfo_charset, charset); - if (!out) - die("cannot convert from %s to %s", - charset, metainfo_charset); - strbuf_attach(line, out, strlen(out), strlen(out)); + convert_to(line, to_charset, "ISO8859-1"); } static int decode_header_bq(struct strbuf *it) @@ -577,7 +573,7 @@ static int decode_header_bq(struct strbuf *it) break; } if (metainfo_charset) - convert_to_utf8(dec, charset_q.buf); + convert_to(dec, metainfo_charset, charset_q.buf); strbuf_addbuf(&outbuf, dec); strbuf_release(dec); @@ -602,7 +598,7 @@ static void decode_header(struct strbuf *it) * This can be binary guck but there is no charset specified. */ if (metainfo_charset) - convert_to_utf8(it, ""); + guess_and_convert_to(it, metainfo_charset); } static void decode_transfer_encoding(struct strbuf *line) @@ -796,7 +792,7 @@ static int handle_commit_msg(struct strbuf *line) /* normalize the log message to UTF-8. */ if (metainfo_charset) - convert_to_utf8(line, charset.buf); + convert_to(line, metainfo_charset, charset.buf); if (use_scissors && is_scissors_line(line)) { int i; -- ZHANG, Le http://zhangle.co 0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973
Attachment:
pgpNgm8Qw2jv9.pgp
Description: PGP signature