Re: [PATCH] cache-tree: fix strbuf growth in prime_cache_tree_rec()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 06 2023, Derrick Stolee wrote:

> On 2/5/2023 4:12 PM, Ævar Arnfjörð Bjarmason wrote:
> [...]
>> One wonders if (even for this index-related code) we really need such
>> careful management of growth, and could instead do with:
>> 
>> 	strbuf_setlen(tree_path, base_path_len);
>> 	strbuf_add(tree_path, entry.path, entry.pathlen);
>> 	strbuf_addch(tree_path, '/');
>
> This would be my preferred way to go here.

*nod*

>> Or even just:
>> 
>> 	strbuf_addf(tree_path, "%*.s/", (int)entry.pathlen, entry.path);
>
> Please do not add "addf" functions that can be run in tight loops.
> It's faster to do strbuf_add() followed by strbuf_addch().

Good point.

I wondered just how much slower, and it's up to 3x! At least according
to this[1] artificial test case (where I usurped a random test helper).

I wondered if we could just handle some common strbuf_addf() cases
ourselves, and the benchmark shows (manually annotated, too lazy to set
up the -n option):

	git hyperfine -L rev HEAD~5,HEAD~4,HEAD~3,HEAD~2,HEAD~1,HEAD~0 -s 'make CFLAGS=-O3' './t/helper/test-tool online-cpus' -r 3
	[...]
	Summary
	  './t/helper/test-tool online-cpus' in 'HEAD~0' ran <== strbuf_add() + strbuf_addch()
	    1.06 ± 0.11 times faster than './t/helper/test-tool online-cpus' in 'HEAD~1' <== strbuf_addstr() + strbuf_addch()
	    1.18 ± 0.12 times faster than './t/helper/test-tool online-cpus' in 'HEAD~4' <== hand optimized strbuf_addf() for "%sC"
	    1.33 ± 0.18 times faster than './t/helper/test-tool online-cpus' in 'HEAD~2' <== hand optimized strbuf_addf() for "%*sC"
	    2.63 ± 0.05 times faster than './t/helper/test-tool online-cpus' in 'HEAD~5' <== strbuf_addf("%s/")
	    2.92 ± 0.25 times faster than './t/helper/test-tool online-cpus' in 'HEAD~3' <== strbuf_addf("%*s/")

The "hand optimization" just being a very stupid handling of "%sC" for
arbitrary values of a single char "C", and ditto for "%*sC" (which
curiously is slower here).

So, for truly hot loops we'd still want to use the add + addch, but if
anyone's interested (hashtag #leftoverbits) it looks like we could get
some easy wins (and reduction in code size, as we could stop worrying
about addf being slow in most cases) with some very dumb minimal
vaddf(), which could handle these cases (but not anything involving
padding etc.).

I didn't dig, but wouldn't be surprised if the reason is that C
libraries need to carry a relatively fat & general sprintf() for all
those edge cases, locale handling etc, whereas most of our use could
trivially be represented as some sequence of addstr()/addf() etc.

Another interesting approach (and this is very #leftoverbits) would be
to perform the same optimization with coccinelle.

I.e. our current use of it is purely "this code X should be written like
Y, and we should commit Y".

But there's no reason for why we couldn't effectively implement our own
compiler optimizations for our own APIs with it, so just grab "%s/" etc,
unpack that in OCaml, then emit strbuf_add() + strbuf_addch(), and that
would be what the C compiler would see.

1.
	
	9d23ffb1117 addf + nolen
	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
	index 8cb0d53840f..c802ec579d0 100644
	--- a/t/helper/test-online-cpus.c
	+++ b/t/helper/test-online-cpus.c
	@@ -1,9 +1,17 @@
	 #include "test-tool.h"
	 #include "git-compat-util.h"
	 #include "thread-utils.h"
*	+#include "strbuf.h"
	 
	 int cmd__online_cpus(int argc, const char **argv)
	 {
	-	printf("%d\n", online_cpus());
	+	struct strbuf sb = STRBUF_INIT;
	+	const char *const str = "Hello, World";
	+
	+	for (size_t i = 0; i < 10000000; i++) {
	+		strbuf_reset(&sb);
	+		strbuf_addf(&sb, "%s/", str);
	+		puts(sb.buf);
	+	}
	 	return 0;
	 }
	9f74eff5623 addf + nolen optimize
	diff --git a/strbuf.c b/strbuf.c
	index c383f41a3c5..750e5e6a5b4 100644
	--- a/strbuf.c
	+++ b/strbuf.c
	@@ -332,8 +332,16 @@ void strbuf_addchars(struct strbuf *sb, int c, size_t n)
	 void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
	 {
	 	va_list ap;
	+
	 	va_start(ap, fmt);
	-	strbuf_vaddf(sb, fmt, ap);
	+	if (*fmt == '%' && *(fmt + 1) == 's' && *(fmt + 2) && !*(fmt + 3)) {
	+		const char *arg = va_arg(ap, const char *);
	+
	+		strbuf_addstr(sb, arg);
	+		strbuf_addch(sb, *(fmt + 2));
	+	} else {
	+		strbuf_vaddf(sb, fmt, ap);
	+	}
	 	va_end(ap);
	 }
	 
	ca60bb9b479 addf + len
	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
	index c802ec579d0..7257e622015 100644
	--- a/t/helper/test-online-cpus.c
	+++ b/t/helper/test-online-cpus.c
	@@ -7,10 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
	 {
	 	struct strbuf sb = STRBUF_INIT;
	 	const char *const str = "Hello, World";
	+	const size_t len = strlen(str);
	 
	 	for (size_t i = 0; i < 10000000; i++) {
	 		strbuf_reset(&sb);
	-		strbuf_addf(&sb, "%s/", str);
	+		strbuf_addf(&sb, "%*s/", (int)len, str);
	 		puts(sb.buf);
	 	}
	 	return 0;
	1f47987d095 addf + len optimize
	diff --git a/strbuf.c b/strbuf.c
	index 750e5e6a5b4..88801268f7a 100644
	--- a/strbuf.c
	+++ b/strbuf.c
	@@ -334,11 +334,16 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
	 	va_list ap;
	 
	 	va_start(ap, fmt);
	-	if (*fmt == '%' && *(fmt + 1) == 's' && *(fmt + 2) && !*(fmt + 3)) {
	+	if (*fmt == '%' &&
	+	    *(fmt + 1) == '*' &&
	+	    *(fmt + 2) == 's' &&
	+	    *(fmt + 3) &&
	+	    !*(fmt + 4)) {
	+		int len = va_arg(ap, int);
	 		const char *arg = va_arg(ap, const char *);
	 
	-		strbuf_addstr(sb, arg);
	-		strbuf_addch(sb, *(fmt + 2));
	+		strbuf_add(sb, arg, len);
	+		strbuf_addch(sb, *(fmt + 3));
	 	} else {
	 		strbuf_vaddf(sb, fmt, ap);
	 	}
	55c698c0b95 addstr
	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
	index 7257e622015..2716b44ca15 100644
	--- a/t/helper/test-online-cpus.c
	+++ b/t/helper/test-online-cpus.c
	@@ -7,11 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
	 {
	 	struct strbuf sb = STRBUF_INIT;
	 	const char *const str = "Hello, World";
	-	const size_t len = strlen(str);
	 
	 	for (size_t i = 0; i < 10000000; i++) {
	 		strbuf_reset(&sb);
	-		strbuf_addf(&sb, "%*s/", (int)len, str);
	+		strbuf_addstr(&sb, str);
	+		strbuf_addch(&sb, '/');
	 		puts(sb.buf);
	 	}
	 	return 0;
	b17fb99bf7e (HEAD -> master) add
	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
	index 2716b44ca15..5e52b622c4d 100644
	--- a/t/helper/test-online-cpus.c
	+++ b/t/helper/test-online-cpus.c
	@@ -7,10 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
	 {
	 	struct strbuf sb = STRBUF_INIT;
	 	const char *const str = "Hello, World";
	+	const size_t len = strlen(str);
	 
	 	for (size_t i = 0; i < 10000000; i++) {
	 		strbuf_reset(&sb);
	-		strbuf_addstr(&sb, str);
	+		strbuf_add(&sb, str, len);
	 		strbuf_addch(&sb, '/');
	 		puts(sb.buf);
	 	}




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux