Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Date: Wed, 16 Mar 2016 09:15:53 -0700 > Subject: [PATCH] pretty-print: de-tabify indented logs to make things line up properly > > This should all line up: > > Column 1 Column 2 > -------- -------- > A B > ABCD EFGH > SPACES Instead of Tabs > > Even with multi-byte UTF8 characters: > > Column 1 Column 2 > -------- -------- > Ä B > åäö 100 > A Møøse once bit my sister.. > > Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > --- > > This seems to work for me, and while there is some cost, it's minimal. > Doing a "git log > /dev/null" of the current git tree is about 1% slower > because of the tab-finding. A tree with a lot of tabs in the commit > messages would be more noticeable, because then you actually end up > hitting the whole "how wide is this" issue. > > (But if the tabs are all at the beginning of a line, you'd still be ok > and avoid the utf8 width calculations). > > Comments? I stared at it for a while, and didn't spot anything wrong with it. I did wonder about two things, though: (1) if turning your "preparation; do { ... } while()" into "while () { }" would make the result a bit easier to read; (2) if we can somehow eliminate duplication of "tab + 1" (spelled differently on the previous line as "1+tab"), the end result may get easier to follow. but both are minor. > pretty.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 74 insertions(+), 2 deletions(-) > > diff --git a/pretty.c b/pretty.c > index 92b2870a7eab..0b40457f99f0 100644 > --- a/pretty.c > +++ b/pretty.c > @@ -1629,6 +1629,76 @@ void pp_title_line(struct pretty_print_context *pp, > strbuf_release(&title); > } > > +static int pp_utf8_width(const char *start, const char *end) > +{ > + int width = 0; > + size_t remain = end - start; > + > + while (remain) { > + int n = utf8_width(&start, &remain); > + if (n < 0 || !start) > + return -1; > + width += n; > + } > + return width; > +} > + > +/* > + * pp_handle_indent() prints out the intendation, and > + * perhaps the whole line (without the final newline) > + * > + * Why "perhaps"? If there are tabs in the indented line > + * it will print it out in order to de-tabify the line. > + * > + * But if there are no tabs, we just fall back on the > + * normal "print the whole line". > + */ > +static int pp_handle_indent(struct strbuf *sb, int indent, > + const char *line, int linelen) > +{ > + const char *tab; > + > + strbuf_addchars(sb, ' ', indent); > + > + tab = memchr(line, '\t', linelen); > + if (!tab) > + return 0; > + > + do { > + int width = pp_utf8_width(line, tab); > + > + /* > + * If it wasn't well-formed utf8, or it > + * had characters with badly defined > + * width (control characters etc), just > + * give up on trying to align things. > + */ > + if (width < 0) > + break; > + > + /* Output the data .. */ > + strbuf_add(sb, line, tab - line); > + > + /* .. and the de-tabified tab */ > + strbuf_addchars(sb, ' ', 8-(width & 7)); > + > + /* Skip over the printed part .. */ > + linelen -= 1+tab-line; > + line = tab + 1; > + > + /* .. and look for the next tab */ > + tab = memchr(line, '\t', linelen); > + } while (tab); > + > + /* > + * Print out everything after the last tab without > + * worrying about width - there's nothing more to > + * align. > + */ > + strbuf_add(sb, line, linelen); > + return 1; > +} > + > void pp_remainder(struct pretty_print_context *pp, > const char **msg_p, > struct strbuf *sb, > @@ -1652,8 +1722,10 @@ void pp_remainder(struct pretty_print_context *pp, > first = 0; > > strbuf_grow(sb, linelen + indent + 20); > - if (indent) > - strbuf_addchars(sb, ' ', indent); > + if (indent) { > + if (pp_handle_indent(sb, indent, line, linelen)) > + linelen = 0; > + } > strbuf_add(sb, line, linelen); > strbuf_addch(sb, '\n'); > } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html