Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 2 Oct 2023 09:29:58 -0700

On Sun, Oct 1, 2023 at 6:40 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Fri, 2023-09-29 at 17:41 -0700, Alexei Starovoitov wrote:
> [...]
> > I suspect that neither option A "Exit or Loop" or B "widening"
> > are not correct.
> > In both cases we will do states_equal() with states that have
> > not reached exit and don't have live/precision marks.
>
> I'd like to argue about B "widening" for a bit, as I think it might be
> interesting in general, and put A aside for now. The algorithm for
> widening looks as follows:
> - In is_states_equal() for (sl->state.branches > 0 && is_iter_next_insn()) case:
>   - Check if states are equal exactly:
>     - ignore liveness marks on old state;
>     - demand same type for registers and stack slots;
>     - ignore precision marks, instead compare scalars using
>       regs_exact() [this differs from my previous emails, I'm now sure
>       that for this scheme to be correct regs_exact() is needed].
>   - If there is an exact match then follow "hit" branch. The idea
>     being that visiting exactly the same state can't produce new
>     execution paths (like with graph traversal).

Right. Exactly the same C state won't produce new paths
as seen in visited state V, but
if C==V at the same insn indx it means we're in the infinite loop.

>   - If there is no exact match but there is some state V which for
>     which states_equal(V, C) and V and C differ only in inexact
>     scalars:
>     - copy C to state C';
>     - mark range for above mentioned inexact scalars as unbound;
>     - continue verification from C';
>     - if C' verification fails discard it and resume verification from C.
>
> The hope here is that guess for "wide" scalars would be correct and
> range for those would not matter. In such case C' would be added to
> the explored states right after it's creation (as it is a checkpoint),
> and later verification would get to the explored C' copy again, so
> that exact comparison would prune further traversals.
>
> Unfortunately, this is riddled with a number of technical
> complications on implementation side.
> What Andrii suggests might be simpler.
>
> > The key aspect of the verifier state pruning is that safe states
> > in sl list explored all possible paths.
> > Say, there is an 'if' block after iter_destroy.
> > The push_stack/pop_stack logic is done as a stack to make sure
> > that the last 'if' has to be explored before earlier 'if' blocks are checked.
> > The existing bpf iter logic violated that.
> > To fix that we need to logically do:
> >
> > if (is_iter_next_insn(env, insn_idx))
> >   if (states_equal(env, &sl->state, cur)) {
> >    if (iter_state->iter.state == BPF_ITER_STATE_DRAINED)
> >      goto hit;
> >
> > instead of BPF_ITER_STATE_ACTIVE.
> > In other words we need to process loop iter == 0 case first
> > all the way till bpf_exit (just like we do right now),
> > then process loop iter == 1 and let it exit the loop and
> > go till bpf_exit (hopefully state prune will find equal states
> > right after exit from the loop).
> > then process loop iter == 2 and so on.
> > If there was an 'if' pushed to stack during loop iter == 1
> > it has to be popped and explored all the way till bpf_exit.
> >
> > We cannot just replace BPF_ITER_STATE_ACTIVE with DRAINED.
> > It would still be ACTIVE with sl->state.branches==0 after that
> > state explored all paths to bpf_exit.
>
> This is similar to what Andrii suggests, please correct me if I'm wrong:
>
> > 1. If V and C (using your terminology from earlier, where V is the old
> > parent state at some next() call instruction, and C is the current one
> > on the same instruction) are different -- we just keep going. So
> > always try to explore different input states for the loop.
>
> > 2. But if V and C are equivalent, it's too early to conclude anything.
> > So enqueue C for later in a separate BFS queue (and perhaps that queue
> > is per-instruction, actually; or maybe even per-state, not sure), and
> > keep exploring all the other pending queues from the (global) DFS
> > queue, until we get back to state V again. At that point we need to
> > start looking at postponed states for that V state. But this time we
> > should be sure that precision and read marks are propagated from all
> > those terminatable code paths.
>
> More formally, before pruning potential looping states we need to
> make sure that all precision and read marks are in place.
> To achieve this:
> - Process states from env->head while those are available, in case if
>   potential looping state (is_states_equal()) is reached put it to a
>   separate queue.
> - Once all env->head states are processed the only source for new read
>   and precision marks is in postponed looping states, some of which
>   might not be is_states_equal() anymore. Submit each such state for
>   verification until fixed point is reached (repeating steps for
>   env->head processing).

Comparing if (sl->state.branches) makes sense to find infinite loop.
It's waste for the verifier to consider visited state V with branches > 0
for pruning.
The safety of V is unknown. The lack of liveness and precision
is just one part. The verifier didn't conclude that V is safe yet.
The current state C being equivalent to V doesn't tell us anything.

If infinite loop detection logic trips us, let's disable it.
I feel the fix should be in process_iter_next_call() to somehow
make it stop doing push_stack() when states_equal(N-1, N-2).