From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Date: Wed, 16 Mar 2016 09:15:53 -0700 Subject: [PATCH] pretty-print: de-tabify indented logs to make things line up properly This should all line up: Column 1 Column 2 -------- -------- A B ABCD EFGH SPACES Instead of Tabs Even with multi-byte UTF8 characters: Column 1 Column 2 -------- -------- Ä B åäö 100 A Møøse once bit my sister.. Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> --- This seems to work for me, and while there is some cost, it's minimal. Doing a "git log > /dev/null" of the current git tree is about 1% slower because of the tab-finding. A tree with a lot of tabs in the commit messages would be more noticeable, because then you actually end up hitting the whole "how wide is this" issue. (But if the tabs are all at the beginning of a line, you'd still be ok and avoid the utf8 width calculations). Comments? pretty.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 74 insertions(+), 2 deletions(-) diff --git a/pretty.c b/pretty.c index 92b2870a7eab..0b40457f99f0 100644 --- a/pretty.c +++ b/pretty.c @@ -1629,6 +1629,76 @@ void pp_title_line(struct pretty_print_context *pp, strbuf_release(&title); } +static int pp_utf8_width(const char *start, const char *end) +{ + int width = 0; + size_t remain = end - start; + + while (remain) { + int n = utf8_width(&start, &remain); + if (n < 0 || !start) + return -1; + width += n; + } + return width; +} + +/* + * pp_handle_indent() prints out the intendation, and + * perhaps the whole line (without the final newline) + * + * Why "perhaps"? If there are tabs in the indented line + * it will print it out in order to de-tabify the line. + * + * But if there are no tabs, we just fall back on the + * normal "print the whole line". + */ +static int pp_handle_indent(struct strbuf *sb, int indent, + const char *line, int linelen) +{ + const char *tab; + + strbuf_addchars(sb, ' ', indent); + + tab = memchr(line, '\t', linelen); + if (!tab) + return 0; + + do { + int width = pp_utf8_width(line, tab); + + /* + * If it wasn't well-formed utf8, or it + * had characters with badly defined + * width (control characters etc), just + * give up on trying to align things. + */ + if (width < 0) + break; + + /* Output the data .. */ + strbuf_add(sb, line, tab - line); + + /* .. and the de-tabified tab */ + strbuf_addchars(sb, ' ', 8-(width & 7)); + + /* Skip over the printed part .. */ + linelen -= 1+tab-line; + line = tab + 1; + + /* .. and look for the next tab */ + tab = memchr(line, '\t', linelen); + } while (tab); + + /* + * Print out everything after the last tab without + * worrying about width - there's nothing more to + * align. + */ + strbuf_add(sb, line, linelen); + return 1; +} + void pp_remainder(struct pretty_print_context *pp, const char **msg_p, struct strbuf *sb, @@ -1652,8 +1722,10 @@ void pp_remainder(struct pretty_print_context *pp, first = 0; strbuf_grow(sb, linelen + indent + 20); - if (indent) - strbuf_addchars(sb, ' ', indent); + if (indent) { + if (pp_handle_indent(sb, indent, line, linelen)) + linelen = 0; + } strbuf_add(sb, line, linelen); strbuf_addch(sb, '\n'); } -- 2.8.0.rc2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html