Re: [PATCH bpf-next 3/3] selftests/bpf: allow compiling BPF objects without BTF

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Sun, 28 Mar 2021 23:09:23 -0700

On Sun, Mar 28, 2021 at 6:16 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, Mar 26, 2021 at 9:44 AM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> > > Because they double the maintenance cost now and double the support forever.
> > > We never needed to worry about skeleton without BTF and now it would be
> > > a thing ? So all tests realistically need to be doubled: with and without BTF.
> >
> > 2x? Realistically?.. No, I wouldn't say so. :) Extra test or a few
> > might be warranted, but doubling the amount of testing is a huge
> > exaggeration.
>
> It's not only doubling the test coverage, but it would double the development
> effort to keep everything w/BTF and w/o BTF functional.
> So 2x is far from exaggeration.

We must be talking about different things, because I don't understand
where you get all this doubling of development effort from.

>
> > > Even more so for static linking. If one .o has BTF and another doesn't
> > > what linker suppose to do? Keep it, but the linked BTF will sort of cover
> > > both .o-s, but line info in some funcs will be missing.
> > > All these weird combinations would need to be tested.
> >
> > BPF static linker already supports that mode, btw.
>
> Are you considering one-liner commit 78b226d48106 ("libbpf: Skip BTF
> fixup if object file has no BTF")
> as support for static linking w/o BTF?

No, of course not. I'm talking about 8fd27bf69b86 ("libbpf: Add BPF
static linker BTF and BTF.ext support"), which does sensible thing
when some of input object files miss BTF. It still generates valid BTF
with as much data as it is possible to collect from the remaining
object files' BTFs. All the DATASECs will still have variable info and
correct offsets for all the variables that have BTF associated with
them.

78b226d48106 ("libbpf: Skip BTF fixup if object file has no BTF") was
a bug fix for one condition I missed in the original commit. I feel
bad about the inconveniences it caused to you and Jiri, but bugs do
slip up. Which is why I added selftests in this patch, to catch that
next time.

> Jiri's other email pointed out another place in libbpf that breaks w/o BTF.

Jiri's last issue is with add_dummy_ksym_var(), added recently in
kernel function calls support patch set. add_dummy_ksym_var()
shouldn't be called if bpf_object is missing BTF. Previously
find_extern_btf_id() would return -ESRCH in such a case, but
add_dummy_ksym_var() is now called before it.

> The only thing the commit 78b226d48106 achieved is it closed
> the one case of static linker crashing w/o BTF.
> Does linker do anything sensible when a mix of .o with and without BTF
> is given? No.
> It happens not to crash w/o BTF. That's about it.

No, see above.

>
> > And yes, it
> > shouldn't crash the kernel. And you don't need a skeleton or static
> > linker to do that to the kernel, so I don't know how that is a new
> > mode of operation.
> >
> > > The sensible thing to do would be to reject skel generation without BTF
> > > and reject linking without BTF. The user most likely forgot -g during
> > > compilation of bpf prog. The bpftool should give the helpful message
> > > in such case. Whether it's generating skel or linking. Silently proceeding
> > > and generating half-featured skeleton is not what user wanted.
> >
> > Sure, a warning message makes sense. Outright disabling this - not so
> > much. I still can't get why I can't get BPF skeleton and static
> > linking without BTF, if I really want to. Both are useful without BTF.
>
> What are the cases that would benefit?
> So far skeleton was used by tracing progs only. Those that need CO-RE.
> Which means they must have BTF.
> Networking progs didn't need CO-RE and no one bothered adopting
> skeleton because of that.
> Are you implying that a BTF-less skeleton will be useful for
> networking progs? What for?

Yes, I'm outright claiming that it would be useful even without BTF.
skel->progs.my_prog and skel->maps.my_map is extremely useful and cuts
down on boilerplate and accidental prog/map name typos, saving
development time. And it was used for that reason even in
selftests/bpf, just to make tests shorter and nicer. Auto mmap-ed
.data/.rodata, even without BTF, are still usable, if you collect all
your variables into a single struct. Which was the typical case before
the BPF skeleton came about.

So, not having to manually mmap() .rodata/.data/.bss is beneficial.
And it doesn't come at a lot of cost. But again, I'm not advocating
for such an approach, and I'm not saying we should encourage it. But
as I already said, I like tools that don't make unnecessary
assumptions, especially if that doesn't come at huge cost, which I
still think is the case here.

As for why networking folks haven't adopted it. I'd ask networking
folks, but I assume inertia has its place there as well, which is to
be expected and is not bad per se. Changes take time. If they already
have working applications, why bother changing anything?

> So far I see only downsides from increasing the amount of work needed
> to support skel and static linking w/o BTF. The extra 100% or just 1%
> would be equally taxing, since it's extra work for the foreseeable future and
> compounded 1% will add up.
>
> Looking at it from another angle... the skeleton generation existed
> for more than
> a year now and it implicitly required BTF. That requirement was not
> written down.
> Now you're proposing to support skeleton gen w/o BTF when not a single user
> requested it and not providing any advantages of such support while
> ignoring the long term maintenance of such effort.

BPF skeleton works just fine without BTF, if BPF programs don't use
global data. I have no way of knowing how BPF skeleton is used in the
wild, and specifically whether it is used without BTF and
.data/.rodata. But artificially failing BPF skeleton generation now
for such a case would be an obvious regression.

I think those maintenance costs are exaggerated, we are probably not
going to agree on this. I fixed the bug in BPF skeleton making
unnecessary assumptions of BTF DATASEC presence, where it's not
strictly necessary for correctness. I don't see it as an ugly hack or
maintenance burden. It shouldn't need constant attention and shouldn't
break.

>
> Another angle... I did git grep in selftests that use skeleton.
> Only a handful of tests use it as *skel*__open_and_load() w/o global data
> (which would still work w/o BTF)
> and only because we forcefully converted them when skel was introduced.
> bcc/libbpf-tools/* need skel with BTF.
> I couldn't find a _single_ case where people use
> skeleton and don't need BTF driven parts of it.

And I've found a few at Facebook that don't need any BTF and use
open_and_load() and skel->maps and skel->progs accessor. But I don't
know what either of those prove?.. libbpf-tools obviously are using
BTF, because I've set up everything so that BTF is always generated
and I've suggested use of global variables for each converted tool.
But again, how is that an argument for anything. We can't know
everything that people are doing.

I'll say that again. I hate when tools assume how I'm going to use
them. They should do what they are meant to do and don't impose
unnecessary restrictions.

>
> > So I don't know, it's the third different argument I'm addressing
> without any conclusion on the previous two.
>
> So far you haven't addressed the first question:
> Who is asking for a BTF-less skeleton? What for? What features are requested?
> I've seen none of such requests.

No one is asking for that, but they might be already using BTF-less
skeleton. So I'm fixing a bug in bpftool. In a way that doesn't cause
long term maintenance burden. And see above about my stance on tools'
assumptions.

BPF skeleton wasn't added specifically to access global variables.
skel->maps and skel->progs were enough motivation for me to add BPF
skeleton. And still is enough.

>
> BTF is not a debug info of the BPF program.
> If it was then I would agree that compiling with and without -g shouldn't
> make a difference.

It effectively doesn't in a lot of cases (all the "classic" BPF
programs that used kprobes, tracepoints, perfbuf, maps, etc). But I'm
not arguing for not using BTF at all. I've myself contributed a lot of
time and effort to make BTF almost universal in BPF ecosystem. If I
didn't believe in BTF, I wouldn't add BTF-defined maps, just as a one
example of reliance on BTF.

All I'm asking is to let me fix a bug in BPF skeleton generation and
have a test validating that it is fixed.

> But BTF is not a debug-info. It's type and much more description of the program
> that is mandatory for the verification of the program.
> btf_id-based attach, CO-RE, trampoline, calling kernel funcs, etc all
> require BTF.

If fentry/fexit (whether it is btf_id-based or trampoline in your
classification), then BTF for the BPF program itself is not required
for it to work at all. Only kernel BTF is a requirement for those. But
we both know this very well. But just as a fun exercise, I just
double-checked by compiling fentry demo from libbpf-bootstrap ([0]).
It works just fine without `-g` and BTF.

  [0] https://github.com/libbpf/libbpf-bootstrap/blob/master/src/fentry.bpf.c

> Not because it's convenient to use BTF, but because assembly doesn't
> have enough information.
> There is no C, C++, Rust equivalent of BTF. There is none in the user
> space world.

Right, because of CO-RE relocations.

> Traditional languages translate a language into assembly code and cpus
> execute it.
> Static analyzers use high level languages to understand the intent.
> BPF workflow translates a language into assembly and BTF, so that the verifier
> can see the intent. It could happen that BPF will gain something else beyond BTF
> and it will become a mandatory part of the workflow. Just like BTF is today.

Yeah, that's fine and we do require BTF for new features (where it
makes sense, of course, not just arbitrarily). Both in kernel and in
libbpf. But kernel doesn't artificially impose BTF restrictions
either. See above about fentry/fexit, they work just fine without BTF.
So do a lot of other features. Same for libbpf, see libbpf_needs_btf()
and kernel_needs_btf().

> At that point all new features will be supported with that new
> "annotation" only,
> not because of "subjective personal preferences", but because that is
> a fundamental
> program description.

Map relocations, BPF-to-BPF calls, global variables, even multiple BPF
programs per-section are all pretty fundamental to BPF programming
with libbpf and they work just fine without BTF. But I agree, BTF is
great and should be leveraged where it provides extra value.