Re: [PATCH bpf-next v2 1/4] bpf: verify scalar ids mapping in regsafe() using check_ids()

Eduard Zingerman <eddyz87@xxxxxxxxx> · Thu, 01 Jun 2023 02:42:10 +0300

On Wed, 2023-05-31 at 15:54 -0700, Andrii Nakryiko wrote:
> [...] 
>
> well, what can I say... force all imprecise logic isn't that
> straightforward either, but so far it still holds. And the big idea
> here is similar: whatever happens between two checkpoints doesn't
> matter if its effect is not visible at the end of the checkpoint.
> 
> So I guess the intent of my proposal is correct: every time we mark r6
> as precise, we should mark r7 as well in each state in which they are
> still linked. Which necessitates to do this on each walk up the state
> chain.
> 
> At least let's give it a try and see how it holds up against existing
> tests and whatever test you can come up with?..

I'll try this thing, thanks a lot for all the input!
Hopefully will get back tomorrow.

> [...]
> 
> BTW, I did contemplate extending jmp_history to contain extra flags
> about "interesting" instructions, though. Specifically, last
> unsupported case for precision backtracking is when register other
> than r10 is used for stack access (which can happen when one passes a
> pointer to a SCALAR to parent function's stack), for which having a
> bit next to such instruction saying "this is really a stack access"
> would help cover the last class of unsupported situations.

Yes, it would have required some kind of redesign for this case as
well. My expectation is that only a few registers get range on each
find_equal_scalars() call, so storing big masks for all frame levels
is very sub-optimal.

> But this is a pretty significant complication. And to make it really
> practical, we'd need to think very hard on how to implement
> jmp_history more efficiently, without constant reallocations. I have a
> hunch that jmp_history should be one large resizable array that all
> states share and just point into different parts of it. When state is
> pushed to the stack, we just remember at which index it starts. When a
> state is finalized, its jump history segment shouldn't be needed by
> that state and can be reused by its siblings and parent states.
> Ultimately, we only have a linear chain of actively worked-on states
> which do use jmp_history, and all others either don't need it
> *anymore* (verified states) or don't need it *yet* (enqueued states).
>
> This would allow us to even have an exact "log of execution" with
> insn_idx and associated extra information, but it will be code path
> dependent, unlike bpf_insn_aux. And the best property is that it will
> never grow beyond 1 million instructions deep (absolute worst case).

I'm not sure I understand why is it bounded.
Conditionals and loops potentially give big number of possible
execution paths over a small number of instructions.
So, keeping per-path / per-instruction still is going to blow up in
terms of memory use. To keep it bounded (?) something smart needs to
know which visited states could never be visited again.

> We might not even need backtracking in its current form if we just
> proactively maintain involved registers information (something that we
> currently derive during instruction interpretation in backtrack_insn).

Well, precision marks still have to be propagated backwards, so some
form of "backward movement" within a state will have to happen anyway.
Like, we have a combination of forward and backward analyses in any case.
Sounds a bit like you want to converge the design to some kind of a
classical data flow analysis, but have states instead of basic blocks.

> So at some point I'd like to get to thinking and implementing this,
> but it isn't the most pressing need right now, of course.
> [...]