Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 2 Oct 2023 17:05:21 -0700

On Mon, Oct 2, 2023 at 10:18 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Mon, 2023-10-02 at 09:29 -0700, Alexei Starovoitov wrote:
> [...]
> > > I'd like to argue about B "widening" for a bit, as I think it might be
> > > interesting in general, and put A aside for now. The algorithm for
> > > widening looks as follows:
> > > - In is_states_equal() for (sl->state.branches > 0 && is_iter_next_insn()) case:
> > >   - Check if states are equal exactly:
> > >     - ignore liveness marks on old state;
> > >     - demand same type for registers and stack slots;
> > >     - ignore precision marks, instead compare scalars using
> > >       regs_exact() [this differs from my previous emails, I'm now sure
> > >       that for this scheme to be correct regs_exact() is needed].
> > >   - If there is an exact match then follow "hit" branch. The idea
> > >     being that visiting exactly the same state can't produce new
> > >     execution paths (like with graph traversal).
> >
> > Right. Exactly the same C state won't produce new paths
> > as seen in visited state V, but
> > if C==V at the same insn indx it means we're in the infinite loop.
>
> This is true in general, but for bpf_iter_next() we have a guarantee
> that iteration would end eventually.
>
> > > More formally, before pruning potential looping states we need to
> > > make sure that all precision and read marks are in place.
> > > To achieve this:
> > > - Process states from env->head while those are available, in case if
> > >   potential looping state (is_states_equal()) is reached put it to a
> > >   separate queue.
> > > - Once all env->head states are processed the only source for new read
> > >   and precision marks is in postponed looping states, some of which
> > >   might not be is_states_equal() anymore. Submit each such state for
> > >   verification until fixed point is reached (repeating steps for
> > >   env->head processing).
> >
> > Comparing if (sl->state.branches) makes sense to find infinite loop.
> > It's waste for the verifier to consider visited state V with branches > 0
> > for pruning.
> > The safety of V is unknown. The lack of liveness and precision
> > is just one part. The verifier didn't conclude that V is safe yet.
> > The current state C being equivalent to V doesn't tell us anything.
> >
> > If infinite loop detection logic trips us, let's disable it.
> > I feel the fix should be in process_iter_next_call() to somehow
> > make it stop doing push_stack() when states_equal(N-1, N-2).
>
> Consider that we get to the environment state where:
> - all env->head states are exhausted;
> - all potentially looping states (stored in as a separate set of
>   states instead of env->head) are states_equal() to some already
>   explored state.
>
> I argue that if such environment state is reached the program should
> be safe, because:
> - Each looping state L is a sub-state of some explored state V and
>   every path from V leads to either safe exit or another loop.
> - Iterator loops are guaranteed to exit eventually.

It sounds correct, but I don't like that the new mechanism
with two stacks of states completely changes the way the verifier works.
The check you proposed:
if (env->stack_size != 0)
      push_iter_stack()
rings alarm bells.

env->stack_size == 0 (same as env->head exhausted) means we're done
with verification (ignoring bpf_exit in callbacks and subprogs).
So above check looks like a hack for something that I don't understand yet.
Also there could be branches in the code before and after iter loop.
With "opportunistic" states_equal() for looping states and delayed
reschedule_loop_states() to throw states back at the verifier
the whole verification model is non comprehensible (at least to me).
The stack + iter_stack + reschedule_loop_states means that in the following:
foo()
{
  br1 // some if() {...} block
  loop {
    br2
  }
  br3
}

the normal verifier pop_stack logic will check br3 and br1,
but br2 may or may not be checked depending on "luck" of states_equal
and looping states that will be in iter_stack.
Then the verifier will restart from checking loop-ing states.
If they somehow go past the end of the loop all kinds of things go crazy.
update_branch_counts() might warn, propagate_liveness, propagate_precision
will do nonsensical things.
This out-of-order state processing distorts the existing model so
much that I don't see how we can reason about these two stacks verification.

I think the cleaner solution is to keep current single stack model.
In the above example the verifier would reach the end, then check br3,
then check br2,
then we need to split branches counter somehow, so that we can
compare loop iter states with previous visited states that are known
to be safe.
In visited states we explored everything in br3 and in br2,
so no concerns that some path inside the loop or after the loop
missed precision or liveness.
Maybe we can split branches counter into branches due to 'if' blocks
and branches due to process_iter_next_call().
If there are pending process_iter_next_call-caused states it's still
ok to call states_equal on such visited states.

I could be missing something, of course.