Re: CO-RE builtins purity and other compiler optimizations

Yonghong Song <yhs@xxxxxxxx> · Wed, 5 Jul 2023 22:57:09 -0700

On 7/5/23 5:02 PM, Andrii Nakryiko wrote:
On Wed, Jul 5, 2023 at 11:07 AM Jose E. Marchesi
<jose.marchesi@xxxxxxxxxx> wrote:

Hello BPF people!

We are still working in supporting the pending CO-RE built-ins in GCC.
The trick of hooking in the parser to avoid constant folding, as
discussed during LSFMMBPF, seems to work well.  Almost there!

So, most of the CO-RE associated C built-ins have the side effect of
emiting a CO-RE relocation in the .BTF.ext section.  This is for example
the case of __builtin_preserve_enum_value.

Like calls to regular functions, calls to C built-ins are also
candidates to certain optimizations.  For example, given this code:

: int a = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);
: int b = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE);

The compiler may very well decide to optimize out the second call to the
built-in if it is to be considered "pure", i.e. given exactly the same
arguments it produces the same results.

We observed that clang indeed seems to optimize that way.  See
https://godbolt.org/z/zqe9Kfrrj .

That kind of optimizations have an impact on the number of CO-RE
relocations emitted.

Question:

Is the BPF loader, the BPF verifier or any other core component sensible
in any way to the number (and ordering) of CO-RE relocations for some
given BPF C program?  i.e. compiling the same BPF C program above with
and without that optimization, will it work in both cases?

Yes, it should.

If no, then perfect!  Different compilers can optimize slightly

Did you mean "if yes, then perfect"? Because otherwise it makes no sense :)

differently (or not optimize at all) and we can mark these built-ins as
pure in GCC as well, benefiting from optimizations without worrying to
have to emit exactly what clang emits.

Yes, it should be fine, as long as the compiler doesn't assume any
specific value returned by CO-RE relocation (and doesn't perform any
optimizations based on that assumed value). From the BPF verifier
side, it's just a constant, so the BPF verifier itself doesn't care.
 From the libbpf/BPF loader standpoint, all that matters is that there
is CO-RE relocation information that specifies how some BPF
instruction needs to be adjusted to match the host kernel properly.
Whether CO-RE relocation is repeated many times, or performed just
once and that constant value is just reused in the code many times,
shouldn't matter at all.

For cases like this:

>> : int a = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, 
BPF_ENUMVAL_VALUE);
>> : int b = __builtin_preserve_enum_value(*(typeof(enum E) *)eB, 
BPF_ENUMVAL_VALUE);

Internally llvm (one bpf backend pass) will converts
  __builtin_preserve_enum_value(*(typeof(enum E) *)eB, BPF_ENUMVAL_VALUE)
to a global variable based on the captured info, builtin,
type, value, etc.

Since 'int a = ...' and 'int b = ...' have the same value,
the same bpf backend pass will only creates one global variable,
hence effectively doing CSE.

gcc might implement different way, but for the same
built in type + its same source representation, CSE
should be okay.

If yes, wouldn't it be better to disable that kind of optimization in
all C BPF compilers, i.e. to make the compilers aware of the side-effect
so they will not optimize built-in calls out (or replicate them.) and to
make this mandatory in the CO-RE spec?  Making a compiler to optimize
exactly like another compiler is difficult and sometimes even not
feasible.

Thanks in advance for the clarification/info!