On Wed, 2023-05-31 at 15:54 -0700, Andrii Nakryiko wrote: > [...] > > well, what can I say... force all imprecise logic isn't that > straightforward either, but so far it still holds. And the big idea > here is similar: whatever happens between two checkpoints doesn't > matter if its effect is not visible at the end of the checkpoint. > > So I guess the intent of my proposal is correct: every time we mark r6 > as precise, we should mark r7 as well in each state in which they are > still linked. Which necessitates to do this on each walk up the state > chain. > > At least let's give it a try and see how it holds up against existing > tests and whatever test you can come up with?.. I'll try this thing, thanks a lot for all the input! Hopefully will get back tomorrow. > [...] > > BTW, I did contemplate extending jmp_history to contain extra flags > about "interesting" instructions, though. Specifically, last > unsupported case for precision backtracking is when register other > than r10 is used for stack access (which can happen when one passes a > pointer to a SCALAR to parent function's stack), for which having a > bit next to such instruction saying "this is really a stack access" > would help cover the last class of unsupported situations. Yes, it would have required some kind of redesign for this case as well. My expectation is that only a few registers get range on each find_equal_scalars() call, so storing big masks for all frame levels is very sub-optimal. > But this is a pretty significant complication. And to make it really > practical, we'd need to think very hard on how to implement > jmp_history more efficiently, without constant reallocations. I have a > hunch that jmp_history should be one large resizable array that all > states share and just point into different parts of it. When state is > pushed to the stack, we just remember at which index it starts. When a > state is finalized, its jump history segment shouldn't be needed by > that state and can be reused by its siblings and parent states. > Ultimately, we only have a linear chain of actively worked-on states > which do use jmp_history, and all others either don't need it > *anymore* (verified states) or don't need it *yet* (enqueued states). > > This would allow us to even have an exact "log of execution" with > insn_idx and associated extra information, but it will be code path > dependent, unlike bpf_insn_aux. And the best property is that it will > never grow beyond 1 million instructions deep (absolute worst case). I'm not sure I understand why is it bounded. Conditionals and loops potentially give big number of possible execution paths over a small number of instructions. So, keeping per-path / per-instruction still is going to blow up in terms of memory use. To keep it bounded (?) something smart needs to know which visited states could never be visited again. > We might not even need backtracking in its current form if we just > proactively maintain involved registers information (something that we > currently derive during instruction interpretation in backtrack_insn). Well, precision marks still have to be propagated backwards, so some form of "backward movement" within a state will have to happen anyway. Like, we have a combination of forward and backward analyses in any case. Sounds a bit like you want to converge the design to some kind of a classical data flow analysis, but have states instead of basic blocks. > So at some point I'd like to get to thinking and implementing this, > but it isn't the most pressing need right now, of course. > [...]