On Thu, Nov 9, 2023 at 7:20 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Mon, 2023-10-30 at 22:03 -0700, Andrii Nakryiko wrote: > > Instead of allocating and copying jump history each time we enqueue > > child verifier state, switch to a model where we use one common > > dynamically sized array of instruction jumps across all states. > > > > The key observation for proving this is correct is that jmp_history is > > only relevant while state is active, which means it either is a current > > state (and thus we are actively modifying jump history and no other > > state can interfere with us) or we are checkpointed state with some > > children still active (either enqueued or being current). > > > > In the latter case our portion of jump history is finalized and won't > > change or grow, so as long as we keep it immutable until the state is > > finalized, we are good. > > > > Now, when state is finalized and is put into state hash for potentially > > future pruning lookups, jump history is not used anymore. This is > > because jump history is only used by precision marking logic, and we > > never modify precision markings for finalized states. > > > > So, instead of each state having its own small jump history, we keep > > a global dynamically-sized jump history, where each state in current DFS > > path from root to active state remembers its portion of jump history. > > Current state can append to this history, but cannot modify any of its > > parent histories. > > > > Because the jmp_history array can be grown through realloc, states don't > > keep pointers, they instead maintain two indexes [start, end) into > > global jump history array. End is exclusive index, so start == end means > > there is no relevant jump history. > > > > This should eliminate a lot of allocations and minimize overall memory > > usage (but I haven't benchmarked on real hardware, and QEMU benchmarking > > is too noisy). > > > > Also, in the next patch we'll extend jump history to maintain additional > > markings for some instructions even if there was no jump, so in > > preparation for that call this thing a more generic "instruction history". > > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > Nitpick: could you please add a comment somewhere in the code > (is_state_visited? pop_stack?) saying something like this: > > states in the env->head happen to be sorted by insn_hist_end in > descending order, so popping next state for verification poses no > risk of overwriting history relevant for states remaining in > env->head. > > Side note: this change would make it harder to change states traversal > order to something other than DFS, should we chose to do so. I have the same concern. When we discussed different algorithms to solve open-coded-iters/bpf_loop issue non-DFS ideas came up multiple times. To be fair I didn't like them, because I wanted to preserve DFS property :) but I feel sooner or later we will be forced to explore non-DFS. So I think this patch is no go. There is really no need to rely on DFS here. Let instruction history consume more memory. It's a better long term trade off. We don't do strict DFS today. The speculative execution analysis is DFS, but it visits paths multiple times, so it's not a canonical DFS. It probably doesn't break this particular insn_hist approach, but still feels too fragile to rely on DFS assumption long term.