Re: [PATCH] log --format: document %w

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 22 Nov 2009 09:10:14 -0800

René Scharfe <rene.scharfe@xxxxxxxxxxxxxx> writes:

> I'm not especially proud of the triple negative in that note.  How to say it
> better, yet concise?
> +- '%w([<w>[,<i1>[,<i2>]]])': switch line wrapping, like the -w option of
> +  linkgit:git-shortlog[1].  NOTE: Color placeholders (`%C*`) are not
> +  recognized as having no width, so they should not be put into wrapped
> +  sections.

"The code miscounts the width of '%C*' color placeholders"?

Perhaps somebody in the codepath leading to pick_one_utf8_char() in utf8.c
can be made aware of them?

utf8_width() is called from many places (has one caller outside utf8.c as
well).  It is given a pointer to a pointer that points at the current
position in a string, and is responsible for picking up one logical letter
advancing the given pointer to skip over that letter, and returning the
display width of that one letter.  The function wants the string to be
encoded in utf-8 and signals by putting NULL in the pointer when it
detects the input string is not.

Picking up one logical letter is done by pick_one_utf8_char(), which is a
nicely written generic "We are at the character boundary of a potentially
multi-byte utf-8 string; pick the first character" implementation, and we
wouldn't want to contaminate that with escape sequence logic---we might
want to reuse it in other codepaths where we have no reason to expect any
escape sequences.

So perhaps we can introduce is_esc_sequence(s, r, w) that

 - returns true if we are at the beginning of an esc-sequence;
 - skips the sequence just like utf8_width() does with s and r; and
 - counts the width of the sequence and returns it in *w

The implementation of the is_esc_sequence() could be to only detect the
color sequence (if the sequence has things like cursor-position control
then we are already lost, as calling "utf8_width()" on such a string does
not make much sense anyway) and report zero-width.

I dunno.

diff --git a/utf8.c b/utf8.c
index 5c18f0c..d45e75f 100644
--- a/utf8.c
+++ b/utf8.c
@@ -241,7 +241,12 @@ invalid:
  */
 int utf8_width(const char **start, size_t *remainder_p)
 {
-	ucs_char_t ch = pick_one_utf8_char(start, remainder_p);
+	ucs_char_t ch;
+	int w;
+
+	if (is_esc_sequence(start, remainder_p, &w))
+		return w;
+	ch = pick_one_utf8_char(start, remainder_p);
 	if (!*start)
 		return 0;
 	return git_wcwidth(ch);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html