Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

Kaartic Sivaraam <kaarticsivaraam91196@xxxxxxxxx> · Wed, 20 Sep 2017 13:05:01 +0530

Hi,

Though this thread seems to have reached a conclusion, I just wanted to
know what I was missing about the optimisation.

On Wednesday 20 September 2017 08:00 AM, Jonathan Nieder wrote:
From that link:
     for ( ;valid_int && *valid_int < 10; (*valid_int)++) {
         printf("Valid instance");
     }

Both gcc and clang are able to optimize out the 'valid_int &&' because
it is dereferenced on the RHS of the &&.

For comparison, 'item < (list)->items + (list)->nr' does not
dereference (list)->items.  So that optimization doesn't apply here.

A smart compiler could be able to take advantage of there being no
object pointed to by a null pointer, which means

	item < (list)->items + (list)->nr

is always false when (list)->items is NULL, which in turn makes a
'(list)->items &&' test redundant.  But a quick test with gcc 4.8.4
-O2 finds that at least this compiler does not contain such an
optimization.  The overhead Michael Haggerty mentioned is real.

I thought the compiler optimized that check out of the loop because the
check was "invariant" across loop runs. IOW, the values used in the check
didn't change across loop runs so the compiler thought it's better to do
the check once outside the loop rather than doing it each time inside
the loop. I guess this is some kind of "loop unswitching"[1]. I don't 
see how
dereferencing influences the optimization here.

Just to be sure, I tried once more to see whether the compiler optimizes 
this
or not. This time with a more similar example and even using the macro 
of concern.
Surprisingly, the compiler did optimize the check out of the loop. This 
time both
'gcc' and 'clang' with an -O1 !

https://godbolt.org/g/Y6rHc1
https://godbolt.org/g/EMrftw

So, is the overhead still real or am I missing something?

[1] : https://en.wikipedia.org/wiki/Loop_unswitching

---
Kaartic