Re: [PATCH v3 bpf-next 0/9] bpf: bounded loops and other features

Alexei Starovoitov <ast@xxxxxx> · Mon, 17 Jun 2019 18:57:40 +0000

On 6/17/19 9:39 AM, Andrii Nakryiko wrote:
> On Sat, Jun 15, 2019 at 12:12 PM Alexei Starovoitov <ast@xxxxxxxxxx> wrote:
>>
>> v2->v3: fixed issues in backtracking pointed out by Andrii.
>> The next step is to add a lot more tests for backtracking.
>>
> 
> Tests would be great, verifier complexity is at the level, where it's
> very easy to miss issues.
> 
> Was fuzzying approach ever discussed for BPF verifier? I.e., have a
> fuzzer to generate both legal and illegal random small programs. Then
> re-implement verifier as user-level program with straightforward
> recursive exhaustive verification (so no state pruning logic, no
> precise/coarse, etc, just register/stack state tracking) of all
> possible branches. If kernel verifier's verdict differs from
> user-level verifier's verdict - flag that as a test case and figure
> out why they differ. Obviously that would work well only for small
> programs, but that should be a good first step already.
> 
> In addition, if this is done, that user-land verifier can be a HUGE
> help to BPF application developers, as libbpf would (potentially) be
> able to generate better error messages using it as well.

In theory that sounds good, but doesn't work in practice.
The kernel verifier keeps changing faster than user space can catch up.
It's also relying on loaded maps and all sorts of callbacks that
check context, allowed helpers, maps, combinations of them from all
over the kernel.
The last effort to build kernel verifier as-is into .o and link
with kmalloc/map wrappers in user space was here:
https://github.com/iovisor/bpf-fuzzer
It was fuzzing the verifier and was able to find few minor bugs.
But it quickly bit rotted.

Folks brought up in the past the idea to collect user space
verifiers from different kernels, so that user space tooling can
check whether particular program will load on a set of kernels
without need to run them in VMs.
Even if such feature existed today it won't really solve this production
headache, since all kernels prior to today will not be covered.

I think syzbot is still generating bpf programs. iirc it found
one bug in the past in the verifier core.
I think the only way to make verifier more robust is to keep
adding new test cases manually.
Most interesting bugs we found by humans.

Another approach to 'better error message' that was considered
in the past was to teach llvm to recognize things that verifier
will reject and let llvm warn on them.
But it's also not practical. We had llvm error on calls.
Then we added them to the verifier and had to change llvm.
If we had llvm error on loops, now we'd need to change it.
imo it's better to let llvm handle everything.