Re: eBPF CO-RE cross-compilation for 32-bit ARM platforms

Jakov Petrina <jakov.petrina@xxxxxxxxxx> · Mon, 10 Aug 2020 10:56:54 +0200

Hi,

On 07/08/2020 21:46, Andrii Nakryiko wrote:
First showstopper for cross-compiling aforementioned example on the ARM
32-bit platform has been with regards to generation of the required
`vmlinux.h` kernel header from the BTF information. More specifically,
our initial approach to have e.g. a compilation target dependency which
would invoke `bpftool` at configure time was not appropriate due to
several issues: a) CO-RE requires host kernel to have been compiled in
such a way to expose BTF information which may not available, and b) the
generated `vmlinux.h` was actually architecture-specific.

That's not exactly true, about "CO-RE requires host kernel to have
been compiled...". You can pass any kernel image as a parameter to
bpftool as an input to generate vmlinux.h for that target
architecture. The only limitation right now, I think, is that their
endianness have to match. We'll probably get over this limitation some
time by end of this year, though.

Ah, I was not aware this was possible, thanks; it will certainly cut 
down on the time it takes to generate headers for other arches.

So in your case, I'd recommend to generate per-architecture vmlinux.h
and use the appropriate one when you cross-compile. I don't think we
ever intended to support single CO-RE BPF binary across architectures,
given it's not too bad to compile same code one time for each target
architecture. Compiling once for each kernel version/variant was much
bigger problem, which is what we tackled.

Agreed, kernel compatibility is a bit more crucial here; we are 
comfortable with handling cross-compilation for other arches.

However, there are certainly drawbacks to this approach: a) (relatively)
large file size of the generated headers, b) regular maintenance to
re-generate the header files for various architectures and kernel
versions, and c) incompatible definitions being generated, to name a
few. This last point relates to the the fact that our `aarch64`/`arm64`
kernel generates the following definition using `bpftool`, which has
resulted in compilation failure:

```
typedef __Poly8_t poly8x16_t[16];
```

AFAICT these are ARM NEON intrinsic definitions which are GCC-specific.
We have opted to comment out this line as there was no additional
`poly8x16_t` usage in the header file.

Ok, so for a) why the size of vmlinux.h is a big factor? You use it on
host machine during compilation only, after that you don't have to
distribute it anywhere. I just checked the size of vmlinux.h we use to
write BPF programs for production, it's at 2.5MB. Having even few of
those (if you need x86 + ARM32 + ARM64 + s390x + whatever) isn't a big
deal, IMO, you can just check them in into your source control system?
If the size is a concern, I'd be curious to hear why.

Yup, we currently have these files included with our source and it 
hasn't been that bad. However, it struck us as a not the most elegant 
solution given the fact that these are large pre-generated files which 
require manual intervention to update.

However, given that a running kernel is not necessary to create these 
files perhaps we might develop internal tooling to make this process as 
easy as possible.

b) Hm.. how often do you intend to re-geneate them? Unless you are
using some bleeding-edge and volatile features of kernel and/or
compiled-in drivers, you shouldn't need to re-generate it all that
often. Maybe once every kernel release, maybe even less frequently. We
update those vmlinux.h only when there is some new set of features
(e.g., bpf_iter) added and we need those types, or when we get a new
major kernel version bump. So far so good. But your constraints might
differ, so I'd like to learn more.

We are currently looking into bleeding-edge features of the kernel, but 
they mostly concern eBPF itself; I suppose that for us, updating these 
headers should be done when new features are introduced to the kernel. 
When we identify applications of eBPF we will most likely have more 
constraints to keep track of.

c) I addressed in another reply. BTF dumper in libbpf maintains a list
of types that are compiler-provided and avoid generating types for
them, assuming compiler will have them. So far we've handled it simply
for __builtin_va_list, we can probably do something like that here as
well?

Great, I think that is an acceptable solution.

Given various issues we have encountered so far (among which is a kernel
panic/crash on a specific device), additional input and feedback
regarding cross-compilation of the eBPF utilities would be greatly
appreciated.

Please report the panic with more details separately. If you are
referring to cross-compiling libbpf-tools in BCC repo, we can play
with that, generate a separate vmlinux.<arch>.h. It's a bit hard for
me to test as I don't have easy access to anything beyond x86-64, so
some help from other folks would be very appreciated.

Thanks, as mentioned in another reply we have been attempting to 
reproduce this issue in a QEMU ARM environment but so far we haven't 
been successful. We will most likely move over to debugging it directly 
on our target hardware and report it when we have more information.

Regards,
--
Jakov Petrina