Re: [PATCH v4 1/4] mailinfo.c: convert_to_utf8(): added a target_charset parameter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12:23 Mon 29 Nov     , Junio C Hamano wrote:
> "ZHANG, Le" <r0bertz@xxxxxxxxxx> writes:
> 
> > This is required for my recode-patch patch which needs a seperate patch_charset variable.
> >
> > Signed-off-by: ZHANG, Le <r0bertz@xxxxxxxxxx>
> > ---
> 
> Thanks.
> 
> Please describe what the new parameter means.  Is it used to convert the
> contents in "line" from "charset" to "target_charset"?  Perhaps it is a
> good time to rename the function to "convert_to()", its "charset"
> parameter to "from_charset", and name the new parameter "to_charset", in
> order to reduce confusion?
> 
> It is not just _your_ patch but will help other/later patches, so you may
> want to phrase the proposed commit log message a bit differently (and with
> a narrower page width, like 68-74 chars per line).  Perhaps like...
> 
>     mailinfo.c: Allow convert_to_utf8() to specify both src/dst charset
> 
>     The convert_to_utf8() function actually converts to whatever charset
>     "metainfo_charset" variable contains, which is not necessarily UTF-8.
>     Rename it to convert_to(), and give an extra parameter "to_charset" to
>     specify what charset to re-encode to.  Also rename its "charset"
>     parameter to "from_charset" to clarify which is which.

Thank you for reviewing! And sorry for late response.

For this problem only, I'd like to propose a slightly better approach based on
your suggestion.

Currently, in mailinfo.c, there are 3 calls to convert_to_utf8(). Two of calls
specifies from_charset, the third doesn't. For those two calls which already
specifies from_charset, there is no need to guess from_charset. So I make two
convert functions. One is guess_and_convert_to(), the other is convert_to().
The next version of repository_encoding patch[1] will use convert_to().

[1] http://thread.gmane.org/gmane.comp.version-control.git/162345/focus=162424

Here is the patch:

diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
index 71e6262..0f42ff1 100644
--- a/builtin/mailinfo.c
+++ b/builtin/mailinfo.c
@@ -472,6 +472,20 @@ static struct strbuf *decode_b_segment(const struct strbuf *b_seg)
 	return out;
 }
 
+static void convert_to(struct strbuf *line, const char *to_charset, const char *from_charset)
+{
+	char *out;
+
+	if (!strcasecmp(to_charset, from_charset))
+		return;
+
+	out = reencode_string(line->buf, to_charset, from_charset);
+	if (!out)
+		die("cannot convert from %s to %s",
+		    from_charset, to_charset);
+	strbuf_attach(line, out, strlen(out), strlen(out));
+}
+
 /*
  * When there is no known charset, guess.
  *
@@ -483,32 +497,14 @@ static struct strbuf *decode_b_segment(const struct strbuf *b_seg)
  * Otherwise, we default to assuming it is Latin1 for historical
  * reasons.
  */
-static const char *guess_charset(const struct strbuf *line, const char *target_charset)
+static void guess_and_convert_to(struct strbuf *line, const char *to_charset)
 {
-	if (is_encoding_utf8(target_charset)) {
+	if (is_encoding_utf8(to_charset)) {
 		if (is_utf8(line->buf))
-			return NULL;
-	}
-	return "ISO8859-1";
-}
-
-static void convert_to_utf8(struct strbuf *line, const char *charset)
-{
-	char *out;
-
-	if (!charset || !*charset) {
-		charset = guess_charset(line, metainfo_charset);
-		if (!charset)
 			return;
 	}
 
-	if (!strcasecmp(metainfo_charset, charset))
-		return;
-	out = reencode_string(line->buf, metainfo_charset, charset);
-	if (!out)
-		die("cannot convert from %s to %s",
-		    charset, metainfo_charset);
-	strbuf_attach(line, out, strlen(out), strlen(out));
+    convert_to(line, to_charset, "ISO8859-1");
 }
 
 static int decode_header_bq(struct strbuf *it)
@@ -577,7 +573,7 @@ static int decode_header_bq(struct strbuf *it)
 			break;
 		}
 		if (metainfo_charset)
-			convert_to_utf8(dec, charset_q.buf);
+			convert_to(dec, metainfo_charset, charset_q.buf);
 
 		strbuf_addbuf(&outbuf, dec);
 		strbuf_release(dec);
@@ -602,7 +598,7 @@ static void decode_header(struct strbuf *it)
 	 * This can be binary guck but there is no charset specified.
 	 */
 	if (metainfo_charset)
-		convert_to_utf8(it, "");
+		guess_and_convert_to(it, metainfo_charset);
 }
 
 static void decode_transfer_encoding(struct strbuf *line)
@@ -796,7 +792,7 @@ static int handle_commit_msg(struct strbuf *line)
 
 	/* normalize the log message to UTF-8. */
 	if (metainfo_charset)
-		convert_to_utf8(line, charset.buf);
+		convert_to(line, metainfo_charset, charset.buf);
 
 	if (use_scissors && is_scissors_line(line)) {
 		int i;


-- 
ZHANG, Le
http://zhangle.co
0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973

Attachment: pgpNgm8Qw2jv9.pgp
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]