Re: [PATCH] cache-tree: fix strbuf growth in prime_cache_tree_rec()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 06.02.23 um 17:18 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Feb 06 2023, Derrick Stolee wrote:
>
>> On 2/5/2023 4:12 PM, Ævar Arnfjörð Bjarmason wrote:
>
>>> Or even just:
>>>
>>> 	strbuf_addf(tree_path, "%*.s/", (int)entry.pathlen, entry.path);
>>
>> Please do not add "addf" functions that can be run in tight loops.
>> It's faster to do strbuf_add() followed by strbuf_addch().
>
> Good point.
>
> I wondered just how much slower, and it's up to 3x! At least according
> to this[1] artificial test case (where I usurped a random test helper).
>
> I wondered if we could just handle some common strbuf_addf() cases
> ourselves, and the benchmark shows (manually annotated, too lazy to set
> up the -n option):
>
> 	git hyperfine -L rev HEAD~5,HEAD~4,HEAD~3,HEAD~2,HEAD~1,HEAD~0 -s 'make CFLAGS=-O3' './t/helper/test-tool online-cpus' -r 3
> 	[...]
> 	Summary
> 	  './t/helper/test-tool online-cpus' in 'HEAD~0' ran <== strbuf_add() + strbuf_addch()
> 	    1.06 ± 0.11 times faster than './t/helper/test-tool online-cpus' in 'HEAD~1' <== strbuf_addstr() + strbuf_addch()
> 	    1.18 ± 0.12 times faster than './t/helper/test-tool online-cpus' in 'HEAD~4' <== hand optimized strbuf_addf() for "%sC"
> 	    1.33 ± 0.18 times faster than './t/helper/test-tool online-cpus' in 'HEAD~2' <== hand optimized strbuf_addf() for "%*sC"
> 	    2.63 ± 0.05 times faster than './t/helper/test-tool online-cpus' in 'HEAD~5' <== strbuf_addf("%s/")
> 	    2.92 ± 0.25 times faster than './t/helper/test-tool online-cpus' in 'HEAD~3' <== strbuf_addf("%*s/")

Woah!

> The "hand optimization" just being a very stupid handling of "%sC" for
> arbitrary values of a single char "C", and ditto for "%*sC" (which
> curiously is slower here).

"%*s" adds padding if needed, your version doesn't.  Perhaps you thought
of "%.*s"?  That might be relevant because for "%*s" vsnprintf(3) needs
to run strlen(3) again on the argument, while for "%.*s" it can stop
when the given length is reached.

> So, for truly hot loops we'd still want to use the add + addch, but if
> anyone's interested (hashtag #leftoverbits) it looks like we could get
> some easy wins (and reduction in code size, as we could stop worrying
> about addf being slow in most cases) with some very dumb minimal
> vaddf(), which could handle these cases (but not anything involving
> padding etc.).
>
> I didn't dig, but wouldn't be surprised if the reason is that C
> libraries need to carry a relatively fat & general sprintf() for all
> those edge cases, locale handling etc, whereas most of our use could
> trivially be represented as some sequence of addstr()/addf() etc.

If that's the reason then resisting the urge to handle ever more cases
in strbuf_addf() would be quite important.

> Another interesting approach (and this is very #leftoverbits) would be
> to perform the same optimization with coccinelle.
>
> I.e. our current use of it is purely "this code X should be written like
> Y, and we should commit Y".
>
> But there's no reason for why we couldn't effectively implement our own
> compiler optimizations for our own APIs with it, so just grab "%s/" etc,
> unpack that in OCaml, then emit strbuf_add() + strbuf_addch(), and that
> would be what the C compiler would see.

Extracting the %s is technically possible using a semantic patch without
scripting:

   @@
   expression sb, str;
   format fmt =~ "^s$";
   @@
   + strbuf_addstr(sb, str);
     strbuf_addf(sb, "%@fmt@..."
   - , str
   + + 2
     );

The "+ 2" is ugly and runs afoul of compiler warning -Wstring-plus-int,
though.  Resolving this probably requires Python scripting as in
https://github.com/coccinelle/coccinelle/blob/master/demos/format.cocci,
or the OCaml magic you have in mind.  I have to admit that I don't even
understand the linked examples, however. :-/

The warning can be avoided by using an array subscription, by the way,
but that's even uglier:

   @@
   expression sb, str;
   format fmt =~ "^s$";
   @@
   + strbuf_addstr(sb, str);
     strbuf_addf(sb,
   + &
     "%@fmt@..."
   - , str
   + [2]
     );

>
> 1.
>
> 	9d23ffb1117 addf + nolen
> 	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
> 	index 8cb0d53840f..c802ec579d0 100644
> 	--- a/t/helper/test-online-cpus.c
> 	+++ b/t/helper/test-online-cpus.c
> 	@@ -1,9 +1,17 @@
> 	 #include "test-tool.h"
> 	 #include "git-compat-util.h"
> 	 #include "thread-utils.h"
> *	+#include "strbuf.h"
>
> 	 int cmd__online_cpus(int argc, const char **argv)
> 	 {
> 	-	printf("%d\n", online_cpus());
> 	+	struct strbuf sb = STRBUF_INIT;
> 	+	const char *const str = "Hello, World";
> 	+
> 	+	for (size_t i = 0; i < 10000000; i++) {
> 	+		strbuf_reset(&sb);
> 	+		strbuf_addf(&sb, "%s/", str);
> 	+		puts(sb.buf);
> 	+	}
> 	 	return 0;
> 	 }
> 	9f74eff5623 addf + nolen optimize
> 	diff --git a/strbuf.c b/strbuf.c
> 	index c383f41a3c5..750e5e6a5b4 100644
> 	--- a/strbuf.c
> 	+++ b/strbuf.c
> 	@@ -332,8 +332,16 @@ void strbuf_addchars(struct strbuf *sb, int c, size_t n)
> 	 void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
> 	 {
> 	 	va_list ap;
> 	+
> 	 	va_start(ap, fmt);
> 	-	strbuf_vaddf(sb, fmt, ap);
> 	+	if (*fmt == '%' && *(fmt + 1) == 's' && *(fmt + 2) && !*(fmt + 3)) {
> 	+		const char *arg = va_arg(ap, const char *);
> 	+
> 	+		strbuf_addstr(sb, arg);
> 	+		strbuf_addch(sb, *(fmt + 2));
> 	+	} else {
> 	+		strbuf_vaddf(sb, fmt, ap);
> 	+	}
> 	 	va_end(ap);
> 	 }
>
> 	ca60bb9b479 addf + len
> 	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
> 	index c802ec579d0..7257e622015 100644
> 	--- a/t/helper/test-online-cpus.c
> 	+++ b/t/helper/test-online-cpus.c
> 	@@ -7,10 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
> 	 {
> 	 	struct strbuf sb = STRBUF_INIT;
> 	 	const char *const str = "Hello, World";
> 	+	const size_t len = strlen(str);
>
> 	 	for (size_t i = 0; i < 10000000; i++) {
> 	 		strbuf_reset(&sb);
> 	-		strbuf_addf(&sb, "%s/", str);
> 	+		strbuf_addf(&sb, "%*s/", (int)len, str);
> 	 		puts(sb.buf);
> 	 	}
> 	 	return 0;
> 	1f47987d095 addf + len optimize
> 	diff --git a/strbuf.c b/strbuf.c
> 	index 750e5e6a5b4..88801268f7a 100644
> 	--- a/strbuf.c
> 	+++ b/strbuf.c
> 	@@ -334,11 +334,16 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
> 	 	va_list ap;
>
> 	 	va_start(ap, fmt);
> 	-	if (*fmt == '%' && *(fmt + 1) == 's' && *(fmt + 2) && !*(fmt + 3)) {
> 	+	if (*fmt == '%' &&
> 	+	    *(fmt + 1) == '*' &&
> 	+	    *(fmt + 2) == 's' &&
> 	+	    *(fmt + 3) &&
> 	+	    !*(fmt + 4)) {
> 	+		int len = va_arg(ap, int);
> 	 		const char *arg = va_arg(ap, const char *);
>
> 	-		strbuf_addstr(sb, arg);
> 	-		strbuf_addch(sb, *(fmt + 2));
> 	+		strbuf_add(sb, arg, len);
> 	+		strbuf_addch(sb, *(fmt + 3));
> 	 	} else {
> 	 		strbuf_vaddf(sb, fmt, ap);
> 	 	}
> 	55c698c0b95 addstr
> 	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
> 	index 7257e622015..2716b44ca15 100644
> 	--- a/t/helper/test-online-cpus.c
> 	+++ b/t/helper/test-online-cpus.c
> 	@@ -7,11 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
> 	 {
> 	 	struct strbuf sb = STRBUF_INIT;
> 	 	const char *const str = "Hello, World";
> 	-	const size_t len = strlen(str);
>
> 	 	for (size_t i = 0; i < 10000000; i++) {
> 	 		strbuf_reset(&sb);
> 	-		strbuf_addf(&sb, "%*s/", (int)len, str);
> 	+		strbuf_addstr(&sb, str);
> 	+		strbuf_addch(&sb, '/');
> 	 		puts(sb.buf);
> 	 	}
> 	 	return 0;
> 	b17fb99bf7e (HEAD -> master) add
> 	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
> 	index 2716b44ca15..5e52b622c4d 100644
> 	--- a/t/helper/test-online-cpus.c
> 	+++ b/t/helper/test-online-cpus.c
> 	@@ -7,10 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
> 	 {
> 	 	struct strbuf sb = STRBUF_INIT;
> 	 	const char *const str = "Hello, World";
> 	+	const size_t len = strlen(str);
>
> 	 	for (size_t i = 0; i < 10000000; i++) {
> 	 		strbuf_reset(&sb);
> 	-		strbuf_addstr(&sb, str);
> 	+		strbuf_add(&sb, str, len);
> 	 		strbuf_addch(&sb, '/');
> 	 		puts(sb.buf);
> 	 	}




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux