On Tue, Sep 26, 2023 at 8:55 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Mon, 2023-09-25 at 17:33 -0700, Andrii Nakryiko wrote: > [...] > > not working for this intermixed iterator case would be ok, I think > > most practical use cases would be a properly nested loops. > > > > But, compiler can actually do something like even for a proper loop. > > E.g., something like below > > > > for (; bpf_iter_num_next(&it); ) { > > .... > > } > > > > in assembly could be layed out as > > > > > > bpf_iter_num_next(&it); > > ... > > again: > > r0 = bpf_iter_num_next(&it) > > ... > > if (r0) goto again > > > > > > Or something along those lines. So these assumptions that the > > iterator's next() call is happening at the same instruction is > > problematic. > > For the specific example above I think it would not be an issue > because there is still only one next() call in the loop => > there would be no stalled next() transition states. > > I checked disassembly for progs/iters.c and it appears that for each > program there compiler preserves nested loops structure such that: > - each loop is driven by a single next() call of a dedicated iterator; > - each nested loop exit goes through destroy() for corresponding > iterator, meaning that outer loop's next will never see inner's loop > iterator as active => no stalled states due to inner loop > processing. > > Of-course, I'm not a robot and might have missed something that would > break real implementation. > > > > Option B. "Widening" > > > -------------------- > > > > > > The trivial fix for current .branches > 0 states comparison is to > > > force "exact" states comparisons for is_iter_next_insn() case: > > > 1. Ignore liveness marks, as those are not finalized yet; > > > 2. Ignore precision marks, as those are not finalized yet; > > > 3. Use regs_exact() for scalars comparison. > > > > > > This, however, is very restrictive as it fails to verify trivial > > > programs like (iters_looping/simplest_loop): > > > > > > sum = 0; > > > bpf_iter_num_new(&it, 0, 10); > > > while (!(r0 = bpf_iter_num_next())) > > > sum += *(u32 *)r0; > > > bpf_iter_num_destroy(&it); > > > > > > Here ever increasing bounds for "sum" prevent regs_exact() from ever > > > returning true. > > > > > > One way to lift this limitation is to borrow something like the > > > "widening" trick from the abstract interpretation papers mentioned > > > earlier: > > > - suppose there is current state C and a visited is_iter_next_insn() > > > state V with .branches > 0; > > > - suppose states_equal(V, C) returns true but exact states comparison > > > returns false because there are scalar values not marked as precise > > > that differ between V and C. > > > - copy C as C' and mark these scalars as __mark_reg_unbounded() in C'; > > > - run verification for C': > > > - if verification succeeds -- problem solved; > > > - if verification fails -- discard C' and proceed from C. > > > > > > Such algorithm should succeed for programs that don't use widened > > > values in "precise" context. > > > > > > Looking at testcases failing with trivial fix I think that such > > > limitations would be similar to those already present (e.g. the big > > > comment in progs/iters.c:iter_obfuscate_counter would still apply). > > > > > > > This makes sense. I was originally thinking along those lines (and > > rejected it for myself), but in a more eager (and thus restrictive) > > way: for any scalar register where old and new register states are > > equivalent thanks to read mark or precision bit (or rather lack of > > precision bit), force that precision/mark in old state. But that is > > too pessimistic for cases where we truly needed to simulate N+1 due to > > state differences, while keeping the old state intact. > > In other words there is a function states_equal' for comparison of > states when old{.branches > 0}, which differs from states_equal in > the following way: > - considers all registers read; > - considers all scalars precise. > Not really. The important aspect is to mark registers that were required to be imprecise in old state as "required to be imprecise", and if later we decide that this register has to be precise, too bad, too late, game over (which is why I didn't propose it, this seems too restrictive). > > What you propose with temporary C -> C' seems to be along similar > > lines, but will give "a second chance" if something doesn't work out: > > we'll go on N+1 instead of just failing. > > Right. > > > But let's think about this case. Let's say in V R1 is set to 7, but > > imprecise. In C we have R1 as 14, also imprecise. According to your > > algorithm, we'll create C1 with R1 set to <any num>. Now I have two > > questions: > > > > 1. How do we terminate C' validation? What happens when we get back to > > the next() call again and do states_equal() between V and C'? We just > > assume it's correct because C' has R1 as unbounded? > > For the definition above states_equal'(V, C') is false, but because we > are at checkpoint on the next iteration we would get to > states_equal'(C', C'') where C'' is derived from C' and same rules > would apply. If we are lucky nothing would change and there would no > point in scheduling another traversal. Ah, I see, "fixed point" state convergence is the hope, ok. > > > 2. Assuming yes. What if later on, V's R1 is actually forced to be > > precise, so only if V's R1=P7, then it is safe. But we already > > concluded that C' is safe based on R1 being unbounded, right? Isn't > > that a violation? > > My base assumption is that: > 1. any instruction reachable from V should also be reachable from C'; > 2. and that it is also guaranteed to be visited from C'. > I cannot give a formal reasoning for (2) at the moment. > > In case if this assumption holds, when verification proceeds from C' > it would eventually get to instruction for which R1 value is only safe > when R1=7, if so: > - verification of C' conjecture would fail; > - all states derived from C' would need to be removed from > states stack / explored states; > - verification should proceed from C. > (And there is also a question of delaying read/precision propagation > from unproven conjecture states to regular states). > > (Also I'm still not sure whether to use regs_exact() for scalars > comparison instead of precision logic in states_equal'). Yeah, it's getting complicated, eh? :) But aside from this approach, I was thinking back to my proposal to combine BFS and DFS approaches. Let's give it another look. Quoting from my earlier email: The idea is to let verifier explore all code paths starting from iteration #N, except the code paths that lead to looping into iteration #N+1. I tried to do that with validating NULL case first and exiting from loop on each iteration (first), but clearly that doesn't capture all the cases, as Eduard have shown. So what if we delay convergence state checks (and then further exploration at iteration #N+1) using BFS instead of DFS? That is, when we are about to start new iteration and check state convergence, we enqueue current state to be processed later after all the states that "belong" to iteration #N are processed. I still think that the whole "let's explore all states except the looping back ones" idea is the right direction. But the above is a bit imprecise about "iteration #N and iteration #N+1" parts. We interpreted it as when we get back to *any* next() call for a given iterator, then we postpone the checks. And you've shown with your counter examples how we actually don't explore all possible code paths. Let's see if we can actually fix that problem and make sure that we do explore everything that is terminatable. How about the following clarifications/refinements to the above: 1. If V and C (using your terminology from earlier, where V is the old parent state at some next() call instruction, and C is the current one on the same instruction) are different -- we just keep going. So always try to explore different input states for the loop. 2. But if V and C are equivalent, it's too early to conclude anything. So enqueue C for later in a separate BFS queue (and perhaps that queue is per-instruction, actually; or maybe even per-state, not sure), and keep exploring all the other pending queues from the (global) DFS queue, until we get back to state V again. At that point we need to start looking at postponed states for that V state. But this time we should be sure that precision and read marks are propagated from all those terminatable code paths. Basically, this tries to make sure that we do mark every register that is important for all the branching decision making, memory dereferences, etc. And just avoids going into endless loops with the same input conditions. Give it some fresh thought and let's see if we are missing something again. Thanks!