Junio C Hamano schrieb: > René Scharfe <rene.scharfe@xxxxxxxxxxxxxx> writes: > >> I'm not especially proud of the triple negative in that note. How to say it >> better, yet concise? >> +- '%w([<w>[,<i1>[,<i2>]]])': switch line wrapping, like the -w option of >> + linkgit:git-shortlog[1]. NOTE: Color placeholders (`%C*`) are not >> + recognized as having no width, so they should not be put into wrapped >> + sections. > > "The code miscounts the width of '%C*' color placeholders"? > > Perhaps somebody in the codepath leading to pick_one_utf8_char() in utf8.c > can be made aware of them? > > utf8_width() is called from many places (has one caller outside utf8.c as > well). It is given a pointer to a pointer that points at the current > position in a string, and is responsible for picking up one logical letter > advancing the given pointer to skip over that letter, and returning the > display width of that one letter. The function wants the string to be > encoded in utf-8 and signals by putting NULL in the pointer when it > detects the input string is not. > > Picking up one logical letter is done by pick_one_utf8_char(), which is a > nicely written generic "We are at the character boundary of a potentially > multi-byte utf-8 string; pick the first character" implementation, and we > wouldn't want to contaminate that with escape sequence logic---we might > want to reuse it in other codepaths where we have no reason to expect any > escape sequences. > > So perhaps we can introduce is_esc_sequence(s, r, w) that > > - returns true if we are at the beginning of an esc-sequence; > - skips the sequence just like utf8_width() does with s and r; and > - counts the width of the sequence and returns it in *w > > The implementation of the is_esc_sequence() could be to only detect the > color sequence (if the sequence has things like cursor-position control > then we are already lost, as calling "utf8_width()" on such a string does > not make much sense anyway) and report zero-width. > > I dunno. > > diff --git a/utf8.c b/utf8.c > index 5c18f0c..d45e75f 100644 > --- a/utf8.c > +++ b/utf8.c > @@ -241,7 +241,12 @@ invalid: > */ > int utf8_width(const char **start, size_t *remainder_p) > { > - ucs_char_t ch = pick_one_utf8_char(start, remainder_p); > + ucs_char_t ch; > + int w; > + > + if (is_esc_sequence(start, remainder_p, &w)) > + return w; > + ch = pick_one_utf8_char(start, remainder_p); > if (!*start) > return 0; > return git_wcwidth(ch); I think utf8_width() is too generic for that; we shouldn't teach it terminal control details. Something like this? It keeps it all local to strbuf_add_wrapped_text(); ignoring display mode escape codes in there can be justified with its purpose. utf8.c | 23 ++++++++++++++++++++++- 1 files changed, 22 insertions(+), 1 deletions(-) diff --git a/utf8.c b/utf8.c index 5c18f0c..fcc0aeb 100644 --- a/utf8.c +++ b/utf8.c @@ -298,6 +298,21 @@ static void print_spaces(struct strbuf *buf, int count) strbuf_write(buf, s, count); } +/* XXX: this handles display mode sequences, only. Do we need more? */ +static size_t esc_sequence_len(const char *s) +{ + const char *p = s; + if (*p++ != '\033') + return 0; + if (*p++ != '[') + return 0; + while (isdigit(*p) || *p == ';') + p++; + if (*p++ != 'm') + return 0; + return p - s; +} + /* * Wrap the text, if necessary. The variable indent is the indent for the * first line, indent2 is the indent for all other lines. @@ -329,7 +344,13 @@ int strbuf_add_wrapped_text(struct strbuf *buf, } for (;;) { - char c = *text; + char c; + size_t skip; + + while ((skip = esc_sequence_len(text))) + text += skip; + + c = *text; if (!c || isspace(c)) { if (w < width || !space) { const char *start = bol; -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html