Am 06.02.23 um 17:18 schrieb Ævar Arnfjörð Bjarmason: > > On Mon, Feb 06 2023, Derrick Stolee wrote: > >> On 2/5/2023 4:12 PM, Ævar Arnfjörð Bjarmason wrote: > >>> Or even just: >>> >>> strbuf_addf(tree_path, "%*.s/", (int)entry.pathlen, entry.path); >> >> Please do not add "addf" functions that can be run in tight loops. >> It's faster to do strbuf_add() followed by strbuf_addch(). > > Good point. > > I wondered just how much slower, and it's up to 3x! At least according > to this[1] artificial test case (where I usurped a random test helper). > > I wondered if we could just handle some common strbuf_addf() cases > ourselves, and the benchmark shows (manually annotated, too lazy to set > up the -n option): > > git hyperfine -L rev HEAD~5,HEAD~4,HEAD~3,HEAD~2,HEAD~1,HEAD~0 -s 'make CFLAGS=-O3' './t/helper/test-tool online-cpus' -r 3 > [...] > Summary > './t/helper/test-tool online-cpus' in 'HEAD~0' ran <== strbuf_add() + strbuf_addch() > 1.06 ± 0.11 times faster than './t/helper/test-tool online-cpus' in 'HEAD~1' <== strbuf_addstr() + strbuf_addch() > 1.18 ± 0.12 times faster than './t/helper/test-tool online-cpus' in 'HEAD~4' <== hand optimized strbuf_addf() for "%sC" > 1.33 ± 0.18 times faster than './t/helper/test-tool online-cpus' in 'HEAD~2' <== hand optimized strbuf_addf() for "%*sC" > 2.63 ± 0.05 times faster than './t/helper/test-tool online-cpus' in 'HEAD~5' <== strbuf_addf("%s/") > 2.92 ± 0.25 times faster than './t/helper/test-tool online-cpus' in 'HEAD~3' <== strbuf_addf("%*s/") Woah! > The "hand optimization" just being a very stupid handling of "%sC" for > arbitrary values of a single char "C", and ditto for "%*sC" (which > curiously is slower here). "%*s" adds padding if needed, your version doesn't. Perhaps you thought of "%.*s"? That might be relevant because for "%*s" vsnprintf(3) needs to run strlen(3) again on the argument, while for "%.*s" it can stop when the given length is reached. > So, for truly hot loops we'd still want to use the add + addch, but if > anyone's interested (hashtag #leftoverbits) it looks like we could get > some easy wins (and reduction in code size, as we could stop worrying > about addf being slow in most cases) with some very dumb minimal > vaddf(), which could handle these cases (but not anything involving > padding etc.). > > I didn't dig, but wouldn't be surprised if the reason is that C > libraries need to carry a relatively fat & general sprintf() for all > those edge cases, locale handling etc, whereas most of our use could > trivially be represented as some sequence of addstr()/addf() etc. If that's the reason then resisting the urge to handle ever more cases in strbuf_addf() would be quite important. > Another interesting approach (and this is very #leftoverbits) would be > to perform the same optimization with coccinelle. > > I.e. our current use of it is purely "this code X should be written like > Y, and we should commit Y". > > But there's no reason for why we couldn't effectively implement our own > compiler optimizations for our own APIs with it, so just grab "%s/" etc, > unpack that in OCaml, then emit strbuf_add() + strbuf_addch(), and that > would be what the C compiler would see. Extracting the %s is technically possible using a semantic patch without scripting: @@ expression sb, str; format fmt =~ "^s$"; @@ + strbuf_addstr(sb, str); strbuf_addf(sb, "%@fmt@..." - , str + + 2 ); The "+ 2" is ugly and runs afoul of compiler warning -Wstring-plus-int, though. Resolving this probably requires Python scripting as in https://github.com/coccinelle/coccinelle/blob/master/demos/format.cocci, or the OCaml magic you have in mind. I have to admit that I don't even understand the linked examples, however. :-/ The warning can be avoided by using an array subscription, by the way, but that's even uglier: @@ expression sb, str; format fmt =~ "^s$"; @@ + strbuf_addstr(sb, str); strbuf_addf(sb, + & "%@fmt@..." - , str + [2] ); > > 1. > > 9d23ffb1117 addf + nolen > diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c > index 8cb0d53840f..c802ec579d0 100644 > --- a/t/helper/test-online-cpus.c > +++ b/t/helper/test-online-cpus.c > @@ -1,9 +1,17 @@ > #include "test-tool.h" > #include "git-compat-util.h" > #include "thread-utils.h" > * +#include "strbuf.h" > > int cmd__online_cpus(int argc, const char **argv) > { > - printf("%d\n", online_cpus()); > + struct strbuf sb = STRBUF_INIT; > + const char *const str = "Hello, World"; > + > + for (size_t i = 0; i < 10000000; i++) { > + strbuf_reset(&sb); > + strbuf_addf(&sb, "%s/", str); > + puts(sb.buf); > + } > return 0; > } > 9f74eff5623 addf + nolen optimize > diff --git a/strbuf.c b/strbuf.c > index c383f41a3c5..750e5e6a5b4 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -332,8 +332,16 @@ void strbuf_addchars(struct strbuf *sb, int c, size_t n) > void strbuf_addf(struct strbuf *sb, const char *fmt, ...) > { > va_list ap; > + > va_start(ap, fmt); > - strbuf_vaddf(sb, fmt, ap); > + if (*fmt == '%' && *(fmt + 1) == 's' && *(fmt + 2) && !*(fmt + 3)) { > + const char *arg = va_arg(ap, const char *); > + > + strbuf_addstr(sb, arg); > + strbuf_addch(sb, *(fmt + 2)); > + } else { > + strbuf_vaddf(sb, fmt, ap); > + } > va_end(ap); > } > > ca60bb9b479 addf + len > diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c > index c802ec579d0..7257e622015 100644 > --- a/t/helper/test-online-cpus.c > +++ b/t/helper/test-online-cpus.c > @@ -7,10 +7,11 @@ int cmd__online_cpus(int argc, const char **argv) > { > struct strbuf sb = STRBUF_INIT; > const char *const str = "Hello, World"; > + const size_t len = strlen(str); > > for (size_t i = 0; i < 10000000; i++) { > strbuf_reset(&sb); > - strbuf_addf(&sb, "%s/", str); > + strbuf_addf(&sb, "%*s/", (int)len, str); > puts(sb.buf); > } > return 0; > 1f47987d095 addf + len optimize > diff --git a/strbuf.c b/strbuf.c > index 750e5e6a5b4..88801268f7a 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -334,11 +334,16 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...) > va_list ap; > > va_start(ap, fmt); > - if (*fmt == '%' && *(fmt + 1) == 's' && *(fmt + 2) && !*(fmt + 3)) { > + if (*fmt == '%' && > + *(fmt + 1) == '*' && > + *(fmt + 2) == 's' && > + *(fmt + 3) && > + !*(fmt + 4)) { > + int len = va_arg(ap, int); > const char *arg = va_arg(ap, const char *); > > - strbuf_addstr(sb, arg); > - strbuf_addch(sb, *(fmt + 2)); > + strbuf_add(sb, arg, len); > + strbuf_addch(sb, *(fmt + 3)); > } else { > strbuf_vaddf(sb, fmt, ap); > } > 55c698c0b95 addstr > diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c > index 7257e622015..2716b44ca15 100644 > --- a/t/helper/test-online-cpus.c > +++ b/t/helper/test-online-cpus.c > @@ -7,11 +7,11 @@ int cmd__online_cpus(int argc, const char **argv) > { > struct strbuf sb = STRBUF_INIT; > const char *const str = "Hello, World"; > - const size_t len = strlen(str); > > for (size_t i = 0; i < 10000000; i++) { > strbuf_reset(&sb); > - strbuf_addf(&sb, "%*s/", (int)len, str); > + strbuf_addstr(&sb, str); > + strbuf_addch(&sb, '/'); > puts(sb.buf); > } > return 0; > b17fb99bf7e (HEAD -> master) add > diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c > index 2716b44ca15..5e52b622c4d 100644 > --- a/t/helper/test-online-cpus.c > +++ b/t/helper/test-online-cpus.c > @@ -7,10 +7,11 @@ int cmd__online_cpus(int argc, const char **argv) > { > struct strbuf sb = STRBUF_INIT; > const char *const str = "Hello, World"; > + const size_t len = strlen(str); > > for (size_t i = 0; i < 10000000; i++) { > strbuf_reset(&sb); > - strbuf_addstr(&sb, str); > + strbuf_add(&sb, str, len); > strbuf_addch(&sb, '/'); > puts(sb.buf); > }