Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Mon, 18 Sep 2023 00:09:17 +0200

On Sun, 17 Sept 2023 at 23:37, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Fri, 2023-07-07 at 11:08 -0700, Andrii Nakryiko wrote:
> > On Fri, Jul 7, 2023 at 9:44 AM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Fri, Jul 7, 2023 at 7:04 AM Andrew Werner <awerner32@xxxxxxxxx> wrote:
> > > >
> > > > When it comes to fixing the problem, I don't quite know where to start.
> > > > Perhaps these iteration callbacks ought to be treated more like global functions
> > > > -- you can't always make assumptions about the state of the data in the context
> > > > pointer. Treating the context pointer as totally opaque seems bad from
> > > > a usability
> > > > perspective. Maybe there's a way to attempt to verify the function
> > > > body of the function
> > > > by treating all or part of the context as read-only, and then if that
> > > > fails, go back and
> > > > assume nothing about that part of the context structure. What is the
> > > > right way to
> > > > think about plugging this hole?
> > >
> > > 'treat as global' might be a way to fix it, but it will likely break
> > > some setups, since people passing pointers in a context and current
> > > global func verification doesn't support that.
> >
> > yeah, the intended use case is to return something from callbacks
> > through context pointer. So it will definitely break real uses.
> >
> > > I think the simplest fix would be to disallow writes into any pointers
> > > within a ctx. Writes to scalars should still be allowed.
> >
> > It might be a partial mitigation, but even with SCALARs there could be
> > problems due to assumed SCALAR range, which will be invalid if
> > callback is never called or called many times.
> >
> > > Much more complex fix would be to verify callbacks as
> > > process_iter_next_call(). See giant comment next to it.
> >
> > yep, that seems like the right way forward. We need to simulate 0, 1,
> > 2, .... executions of callbacks until we validate and exhaust all
> > possible context modification permutations, just like open-coded
> > iterators do it
> >
> > > But since we need a fix for stable I'd try the simple approach first.
> > > Could you try to implement that?
>
> Hi All,
>
> This issue seems stalled, so I took a look over the weekend.
> I have a work in progress fix, please let me know if you don't agree
> with direction I'm taking or if I'm stepping on anyone's toes.
>

I was planning to get to this eventually, but it's great if you are
looking to do it.

> After some analysis I decided to go with Alexei's suggestion and
> implement something similar to iterators for selected set of helper
> functions that accept "looping" callbacks, such as:
> - bpf_loop
> - bpf_for_each_map_elem
> - bpf_user_ringbuf_drain
> - bpf_timer_set_callback (?)
>

The last one won't require this, I think. The callback only runs once,
and asynchronously.
Others you are missing are bpf_find_vma and the bpf_rbtree_add kfunc.
While the latter is not an interator, it can invoke the same callback
an unknown number of times.

The other major thing that needs to be checked is cases where callback
will be executed zero times. There is some discussion on this in [0],
where this bug was reported originally. Basically, we need to explore
a path where the callback execution does not happen and ensure the
program is still safe.

[0]: https://lore.kernel.org/bpf/CAP01T74cOJzo3xQcW6weURH+mYRQ7kAWMqQOgtd_ymSbhrOoMQ@xxxxxxxxxxxxxx

You could also consider taking the selftests from this link, some of
them allow completely breaking safety properties of the verifier.

> The sketch of the fix looks as follows:
> - extend struct bpf_func_state with callback iterator state information:
>   - active/non-active flag
>   - depth
>   - r1-r5 state at the entry to callback;
> - extend __check_func_call() to setup callback iterator state when
>   call to suitable helper function is processed;
> - extend BPF_EXIT processing (prepare_func_exit()) to queue new
>   callback visit upon return from iterating callback
>   (similar to process_iter_next_call());
> - extend is_state_visited() to account for callback iterator hits in a
>   way similar to regular iterators;
> - extend backtrack_insn() to correctly react to jumps from callback
>   exit to callback entry.
>
> I have a patch (at the end of this email) that correctly recognizes
> the bpf_loop example in this thread as unsafe. However this patch has
> a few deficiencies:
>
> - verif_scale_strobemeta_bpf_loop selftest is not passing, because
>   read_map_var function invoked from the callback continuously
>   increments 'payload' pointer used in subsequent iterations.
>
>   In order to handle such code I need to add an upper bound tracking
>   for iteration depth (when such bound could be deduced).
>

Yes, either the loop converges before the upper limit is hit, or we
keep unrolling it until we exhaust the deduced loop count passed to
the bpf_loop helper. But we will need to mark the loop count argument
register as precise when we make use of its value in such a case.

> - loop detection is broken for simple callback as below:
>
>   static int loop_callback_infinite(__u32 idx, __u64 *data)
>   {
>       for (;;)
>           (*ctx)++;
>       return 0;
>   }
>
>   To handle such code I need to change is_state_visited() to do
>   callback iterator loop/hit detection on exit from callback
>   (returns are not prune points at the moment), currently it is done
>   on entry.
>
> - the callback as bellow currently causes state explosion:
>
>   static int precise1_callback(__u32 idx, struct precise1_ctx *ctx)
>   {
>       if (idx == 0)
>           ctx->a = 1;
>       if (idx == 1 && ctx->a == 1)
>           ctx->b = 1;
>       return 0;
>   }
>
>   I'm not sure yet what to do about this, there are several possibilities:
>   - tweak the order in which states are visited (need to think about it);
>   - check states in bpf_verifier_env::head (not explored yet) for
>     equivalence and avoid enqueuing duplicate states.
>
>
> I'll proceed addressing the issues above on Monday.
>

I think there is a class of programs that are nevertheless going to be
broken now, as bpf_loop and others basically permitted incorrect code
to pass through. And the iteration until we arrive at a fixpoint won't
be able to cover all classes of loop bodies, so I think we will need
to make a tradeoff in terms of expressibility vs verifier complexity.
In general, cases like this with branches/conditional exit from the
loop body which do not converge will lead to state explosion anyway.

> Thanks,
> Eduard
>
> ---
>
> [...]