On Sat, Jun 4, 2022 at 5:51 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > Hi Joanne, thanks for the review! > > > On Fri, 2022-06-03 at 17:06 -0700, Joanne Koong wrote: [...] > > > > + > > > + if (fit_for_bpf_loop_inline(aux)) { > > > + if (!subprog_updated) { > > > + subprog_updated = true; > > > + subprogs[cur_subprog].stack_depth += BPF_REG_SIZE * 3; > > > + stack_base = -subprogs[cur_subprog].stack_depth; > > > + } > > > + aux->loop_inline_state.stack_base = stack_base; > > > + } > > > + if (i == subprog_end - 1) { > > > + subprog_updated = false; > > > + cur_subprog++; > > > + if (cur_subprog < env->subprog_cnt) > > > + subprog_end = subprogs[cur_subprog + 1].start; > > > + } > > > + } > > > + > > > + env->prog->aux->stack_depth = env->subprog_info[0].stack_depth; > > > > In the case where a subprogram that is not subprogram 0 is a fit for > > the bpf loop inline and thus increases its stack depth, won't > > env->prog->aux->stack_depth need to also be updated? > > As far as I understand the logic in `do_check_main` and `jit_subprogs` > `env->prog->aux->stack_depth` always reflects the stack depth of the > first sub-program (aka `env->subprog_info[0].stack_depth`). So the > last line of `adjust_stack_depth_for_loop_inlining` merely ensures > this invariant. The stack depth for each relevant subprogram is > updated earlier in the function: > > static void adjust_stack_depth_for_loop_inlining(struct bpf_verifier_env *env) > { > ... > for (i = 0; i < insn_cnt; i++) { > ... > if (fit_for_bpf_loop_inline(aux)) { > if (!subprog_updated) { > subprog_updated = true; > here ----> subprogs[cur_subprog].stack_depth += BPF_REG_SIZE * 3; > ... > } > ... > } > if (i == subprog_end - 1) { > subprog_updated = false; > cur_subprog++; > ... > } > } > ... > } > > Also, the patch v3 4/5 in a series has a test case "bpf_loop_inline > stack locations for loop vars" which checks that stack offsets for > spilled registers are assigned correctly for subprogram that is not a > first subprogram. > Thanks for the explanation :) I misunderstood what prog->aux->stack_depth is used for Looking at arch/arm64/net/bpf_jit_comp.c, I see this diagram /* * BPF prog stack layout * * high * original A64_SP => 0:+-----+ BPF prologue * |FP/LR| * current A64_FP => -16:+-----+ * | ... | callee saved registers * BPF fp register => -64:+-----+ <= (BPF_FP) * | | * | ... | BPF prog stack * | | * +-----+ <= (BPF_FP - prog->aux->stack_depth) * |RSVD | padding * current A64_SP => +-----+ <= (BPF_FP - ctx->stack_size) * | | * | ... | Function call stack * | | * +-----+ * low * */ It looks like prog->aux->stack_depth is used for the "BPF prog stack", which is the stack for the main bpf program (subprog 0) > > > @@ -15030,6 +15216,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr) > > > if (ret == 0) > > > ret = check_max_stack_depth(env); > > > > > > + if (ret == 0) > > > + adjust_stack_depth_for_loop_inlining(env); > > > > Do we need to do this before the check_max_stack_depth() call above > > since adjust_stack_depth_for_loop_inlining() adjusts the stack depth? > > This is an interesting question. I used the following reasoning > placing `adjust_stack_depth_for_loop_inlining` after the > `check_max_stack_depth`: > > 1. If `adjust_stack_depth_for_loop_inlining` is placed before > `check_max_stack_depth` some of the existing BPF programs might > stop loading, because of increased stack usage. > 2. To avoid (1) it is possible to do `check_max_stack_depth` first, > remember actual max depth and apply > `adjust_stack_depth_for_loop_inlining` only when there is enough > stack space. However there are two downsides: > - the optimization becomes flaky, similarly simple changes to the > code of the BPF program might cause loop inlining to stop > working; > - the precise verification itself is non-trivial as each possible > stack trace has to be analyzed in terms of stack size with loop > inlining / stack size without loop inlining. If necessary, I will > probably use a simpler heuristic where stack budget for register > spilling would be computed as > `MAX_BPF_STACK - actual_max_stack_depth` > 3. Things are simpler if MAX_BPF_STACK is a soft limit that ensures > that BPF programs consume limited amount of stack. In such case > current implementation is sufficient. > > So this boils down to the question what is `MAX_BPF_STACK`: > - a hard limit for the BPF program stack size? > - a soft limit used to verify that programs consume a limited amount > of stack while executing? > > If the answer is "hard limit" I'll proceed with implementation for > option (2). I'm not sure either whether MAX_BPF_STACK is a hard limit or a soft limit. I'm curious to know as well. > > Thanks, > Eduard >