On Thu, May 04, 2023 at 01:57:15PM -0700, Andrii Nakryiko wrote: > On Wed, May 3, 2023 at 8:14 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Tue, Apr 25, 2023 at 04:49:07PM -0700, Andrii Nakryiko wrote: > > > Fix propagate_precision() logic to perform propagation of all necessary > > > registers and stack slots across all active frames *in one batch step*. > > > > > > Doing this for each register/slot in each individual frame is wasteful, > > > but the main problem is that backtracking of instruction in any frame > > > except the deepest one just doesn't work. This is due to backtracking > > > logic relying on jump history, and available jump history always starts > > > (or ends, depending how you view it) in current frame. So, if > > > prog A (frame #0) called subprog B (frame #1) and we need to propagate > > > precision of, say, register R6 (callee-saved) within frame #0, we > > > actually don't even know where jump history that corresponds to prog > > > A even starts. We'd need to skip subprog part of jump history first to > > > be able to do this. > > > > > > Luckily, with struct backtrack_state and __mark_chain_precision() > > > handling bitmasks tracking/propagation across all active frames at the > > > same time (added in previous patch), propagate_precision() can be both > > > fixed and sped up by setting all the necessary bits across all frames > > > and then performing one __mark_chain_precision() pass. This makes it > > > unnecessary to skip subprog parts of jump history. > > > > > > We also improve logging along the way, to clearly specify which > > > registers' and slots' precision markings are propagated within which > > > frame. > > > > > > Fixes: 529409ea92d5 ("bpf: propagate precision across all frames, not just the last one") > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > > --- > > > kernel/bpf/verifier.c | 49 +++++++++++++++++++++++++++---------------- > > > 1 file changed, 31 insertions(+), 18 deletions(-) > > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > > index 0b19b3d9af65..66d64ac10fb1 100644 > > > --- a/kernel/bpf/verifier.c > > > +++ b/kernel/bpf/verifier.c > > > @@ -3885,14 +3885,12 @@ int mark_chain_precision(struct bpf_verifier_env *env, int regno) > > > return __mark_chain_precision(env, env->cur_state->curframe, regno, -1); > > > } > > > > > > -static int mark_chain_precision_frame(struct bpf_verifier_env *env, int frame, int regno) > > > +static int mark_chain_precision_batch(struct bpf_verifier_env *env, int frame) > > > { > > > - return __mark_chain_precision(env, frame, regno, -1); > > > -} > > > - > > > -static int mark_chain_precision_stack_frame(struct bpf_verifier_env *env, int frame, int spi) > > > -{ > > > - return __mark_chain_precision(env, frame, -1, spi); > > > + /* we assume env->bt was set outside with desired reg and stack masks > > > + * for all frames > > > + */ > > > + return __mark_chain_precision(env, frame, -1, -1); > > > } > > > > > > static bool is_spillable_regtype(enum bpf_reg_type type) > > > @@ -15308,20 +15306,25 @@ static int propagate_precision(struct bpf_verifier_env *env, > > > struct bpf_reg_state *state_reg; > > > struct bpf_func_state *state; > > > int i, err = 0, fr; > > > + bool first; > > > > > > for (fr = old->curframe; fr >= 0; fr--) { > > > state = old->frame[fr]; > > > state_reg = state->regs; > > > + first = true; > > > for (i = 0; i < BPF_REG_FP; i++, state_reg++) { > > > if (state_reg->type != SCALAR_VALUE || > > > !state_reg->precise || > > > !(state_reg->live & REG_LIVE_READ)) > > > continue; > > > - if (env->log.level & BPF_LOG_LEVEL2) > > > - verbose(env, "frame %d: propagating r%d\n", fr, i); > > > - err = mark_chain_precision_frame(env, fr, i); > > > - if (err < 0) > > > - return err; > > > + if (env->log.level & BPF_LOG_LEVEL2) { > > > + if (first) > > > + verbose(env, "frame %d: propagating r%d", fr, i); > > > + else > > > + verbose(env, ",r%d", i); > > > + } > > > + bt_set_frame_reg(&env->bt, fr, i); > > > + first = false; > > > } > > > > > > for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) { > > > @@ -15332,14 +15335,24 @@ static int propagate_precision(struct bpf_verifier_env *env, > > > !state_reg->precise || > > > !(state_reg->live & REG_LIVE_READ)) > > > continue; > > > - if (env->log.level & BPF_LOG_LEVEL2) > > > - verbose(env, "frame %d: propagating fp%d\n", > > > - fr, (-i - 1) * BPF_REG_SIZE); > > > - err = mark_chain_precision_stack_frame(env, fr, i); > > > - if (err < 0) > > > - return err; > > > + if (env->log.level & BPF_LOG_LEVEL2) { > > > + if (first) > > > + verbose(env, "frame %d: propagating fp%d", > > > + fr, (-i - 1) * BPF_REG_SIZE); > > > + else > > > + verbose(env, ",fp%d", (-i - 1) * BPF_REG_SIZE); > > > + } > > > + bt_set_frame_slot(&env->bt, fr, i); > > > + first = false; > > > } > > > + if (!first) > > > + verbose(env, "\n"); > > > } > > > + > > > + err = mark_chain_precision_batch(env, old->curframe); > > > > The optimization makes sense, sort-of, but 'first' flag is to separate frames on > > different lines ? > > Yes, first is purely for formatting. Each frame will have a separate > line, but all registers and stack frames for that frame will be on > that single line, like so: > > frame 1: propagating r1,r2,r3,fp-8,fp-16 > frame 0: propagating r3,r9,fp-120 I see. The example of the output helped a lot. Now the code makes sense. Pls include it in a commit or a comment.