Re: [PATCH bpf-next v2 1/4] bpf: verify scalar ids mapping in regsafe() using check_ids()

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 31 May 2023 17:23:36 -0700

On Wed, May 31, 2023 at 4:42 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Wed, 2023-05-31 at 15:54 -0700, Andrii Nakryiko wrote:
> > [...]
> >
> > well, what can I say... force all imprecise logic isn't that
> > straightforward either, but so far it still holds. And the big idea
> > here is similar: whatever happens between two checkpoints doesn't
> > matter if its effect is not visible at the end of the checkpoint.
> >
> > So I guess the intent of my proposal is correct: every time we mark r6
> > as precise, we should mark r7 as well in each state in which they are
> > still linked. Which necessitates to do this on each walk up the state
> > chain.
> >
> > At least let's give it a try and see how it holds up against existing
> > tests and whatever test you can come up with?..
>
> I'll try this thing, thanks a lot for all the input!
> Hopefully will get back tomorrow.

Great, let's see how it goes.

>
> > [...]
> >
> > BTW, I did contemplate extending jmp_history to contain extra flags
> > about "interesting" instructions, though. Specifically, last
> > unsupported case for precision backtracking is when register other
> > than r10 is used for stack access (which can happen when one passes a
> > pointer to a SCALAR to parent function's stack), for which having a
> > bit next to such instruction saying "this is really a stack access"
> > would help cover the last class of unsupported situations.
>
> Yes, it would have required some kind of redesign for this case as
> well. My expectation is that only a few registers get range on each
> find_equal_scalars() call, so storing big masks for all frame levels
> is very sub-optimal.
>
> > But this is a pretty significant complication. And to make it really
> > practical, we'd need to think very hard on how to implement
> > jmp_history more efficiently, without constant reallocations. I have a
> > hunch that jmp_history should be one large resizable array that all
> > states share and just point into different parts of it. When state is
> > pushed to the stack, we just remember at which index it starts. When a
> > state is finalized, its jump history segment shouldn't be needed by
> > that state and can be reused by its siblings and parent states.
> > Ultimately, we only have a linear chain of actively worked-on states
> > which do use jmp_history, and all others either don't need it
> > *anymore* (verified states) or don't need it *yet* (enqueued states).
> >
> > This would allow us to even have an exact "log of execution" with
> > insn_idx and associated extra information, but it will be code path
> > dependent, unlike bpf_insn_aux. And the best property is that it will
> > never grow beyond 1 million instructions deep (absolute worst case).
>
> I'm not sure I understand why is it bounded.
> Conditionals and loops potentially give big number of possible
> execution paths over a small number of instructions.
> So, keeping per-path / per-instruction still is going to blow up in
> terms of memory use. To keep it bounded (?) something smart needs to
> know which visited states could never be visited again.

#define BPF_COMPLEXITY_LIMIT_INSNS      1000000 /* yes. 1M insns */

that's the bound, you can't go above that

>
> > We might not even need backtracking in its current form if we just
> > proactively maintain involved registers information (something that we
> > currently derive during instruction interpretation in backtrack_insn).
>
> Well, precision marks still have to be propagated backwards, so some
> form of "backward movement" within a state will have to happen anyway.
> Like, we have a combination of forward and backward analyses in any case.
> Sounds a bit like you want to converge the design to some kind of a
> classical data flow analysis, but have states instead of basic blocks.

It's just an idea. And yes, there will still be backtracking, just
less of "let's try to understand what's going on based on instruction
itself". So instead of deriving that we are exiting from subprog, we
can have a flag that says "this instruction is exit from subprog". The
advantage would be an overall simplification and keeping instruction
interpretation in one place.

But this is offtopic, we can talk about this separately. It's not
urgent and not even imminent change.

>
> > So at some point I'd like to get to thinking and implementing this,
> > but it isn't the most pressing need right now, of course.
> > [...]