Re: [PATCH 8/8] cache-tree: avoid path comparison loop when silent

René Scharfe <l.s.r@xxxxxx> · Fri, 1 Jan 2021 14:30:02 +0100

Am 31.12.20 um 17:46 schrieb Derrick Stolee:
> On 12/31/2020 7:34 AM, René Scharfe wrote:
>> diff --git a/cache-tree.c b/cache-tree.c
>> index a537a806c1..1105cfe6b7 100644
>> --- a/cache-tree.c
>> +++ b/cache-tree.c
>> @@ -187,10 +187,8 @@ static int verify_cache(struct cache_entry **cache,
>>  		 */
>>  		const char *this_name = cache[i]->name;
>>  		const char *next_name = cache[i+1]->name;
>> -		int this_len = strlen(this_name);
>> -		if (this_len < strlen(next_name) &&
>> -		    strncmp(this_name, next_name, this_len) == 0 &&
>> -		    next_name[this_len] == '/') {
>> +		const char *p;
>> +		if (skip_prefix(next_name, this_name, &p) && *p == '/') {
>>  			if (10 < ++funny) {
>>  				fprintf(stderr, "...\n");
>>  				break;
>
> This performs consistently worse than both cases (see performance test
> later in this message). I think the strlen is saving us some time here.
Thanks for checking!  I was expecting skip_prefix to be faster, because
it stops as soon as it reaches a non-matching character, while strlen
has to always walk to the end.  But consecutive entries of a sorted list
of paths can share long prefixes, in particular if there are long
directory names and directories contain lots of files.

> In fact, I think the compiler is doing some magic to save our strlen
> results, as I get identical performance results when doing this:
>
> diff --git a/cache-tree.c b/cache-tree.c
> index c6c084463b..a076e7cec5 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -156,6 +156,8 @@ static int verify_cache(struct cache_entry **cache,
>  {
>         int i, funny;
>         int silent = flags & WRITE_TREE_SILENT;
> +       const char *this_name;
> +       size_t this_len;
>
>         /* Verify that the tree is merged */
>         funny = 0;
> @@ -182,18 +184,21 @@ static int verify_cache(struct cache_entry **cache,
>          * and they should not exist unless a bug was introduced along
>          * the way.
>          */
> -       if (silent)
> -               return 0;
>         funny = 0;
> -       for (i = 0; i < entries - 1; i++) {
> +
> +       if (!entries)
> +               return 0;
> +       this_name = cache[0]->name;
> +       this_len = strlen(this_name);
> +
> +       for (i = 1; i < entries; i++) {
>                 /* path/file always comes after path because of the way
>                  * the cache is sorted.  Also path can appear only once,
>                  * which means conflicting one would immediately follow.
>                  */
> -               const char *this_name = cache[i]->name;
> -               const char *next_name = cache[i+1]->name;
> -               int this_len = strlen(this_name);
> -               if (this_len < strlen(next_name) &&
> +               const char *next_name = cache[i]->name;
> +               size_t next_len = strlen(next_name);
> +               if (this_len < next_len &&
>                     strncmp(this_name, next_name, this_len) == 0 &&
>                     next_name[this_len] == '/') {
>                         if (10 < ++funny) {
> @@ -203,6 +208,8 @@ static int verify_cache(struct cache_entry **cache,
>                         fprintf(stderr, "You have both %s and %s\n",
>                                 this_name, next_name);
>                 }
> +               this_name = next_name;
> +               this_len = next_len;
>         }
>         if (funny)
>                 return -1;
>

The compiler can do that because strlen() is a pure function; glibc
declares it like this:

   extern size_t strlen (const char *__s)
        __THROW __attribute_pure__ __nonnull ((1));

> HOWEVER, if we swap the order of the conditionals to be
>
> 		if (this_len < next_len &&
> 		    strncmp(this_name, next_name, this_len) == 0 &&
> 		    next_name[this_len] == '/') {
>
> then we _do_ get a measurable, consistent speedup.

That's the original order there, but I get it.  The trailing slash has
only a low probability of being present, so in the common case the
strncmp call can be skipped.  And I guess that check is easier to
handle for the branch predictor as well.

Since we know the length of both strings we can use memcmp instead of
strncmp.  It can compare words instead of bytes, so I'd expect it to
be faster.  Checking the slash first should make that difference moot,
though, as it probably eliminates most of the strncmp calls anyway.

René