Re: Tabs in commit messages - de-tabify option in strbuf_stripspace()?

Jeff King <peff@xxxxxxxx> · Wed, 16 Mar 2016 01:17:49 -0400

On Tue, Mar 15, 2016 at 09:57:16PM -0700, Linus Torvalds wrote:

> On Tue, Mar 15, 2016 at 9:46 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> >
> > It also ignores that byte counts of non-HT bytes may not necessarily
> > match display columns.  There is utf8_width() function exported from
> > utf8.c for that purpose.
> 
> Hmm. I did that to now make it horribly slower. Doing the
> per-character widths really does horrible things for performance.
> 
> That said, I think I can make it do the fast thing for lines that have
> no tabs at all, which is the big bulk of it. So it would do the width
> calculations only in the rare "yes, I found a tab" case.
> 
> I already wrote it in a way that non-tab lines end up just looping
> over the line looking for a tab and then fall back to the old code.
> 
> I might just have to make that behavior a bit more explicit. Let me
> stare at it a bit, but it won't happen until tomorrow, I think.

I wondered about performance when reading yours, and measured a straight
"git log >/dev/null" on the kernel at about 3% slower (best-of-five and
average). We can get most of that back by sticking a

    memchr(line, '\t', linelen);

in front and skipping the loop if it returns NULL. You can also
implement your loop in terms of memchr() calls to skip to the next tab,
but I don't know if it's worth it. It's really only worth spending time
optimizing the non-tab case (and even then, only worth it for trivial
things like "use the already optimized-as-hell memchr").

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html