Hi Joanne, thanks for the review! > On Fri, 2022-06-03 at 17:06 -0700, Joanne Koong wrote: > Should this be moved to include/linux/bpf.h since > kernel/bpf/bpf_iter.c also uses this? Then we don't need the "#include > <linux/bpf_verifier.h>" in bpf_iter.c Will do. > Also, as a side-note, I think if we have an "is_valid" field, then we > don't need both the "callback_is_constant" and "flags_is_zero" fields. > If the flags is not zero or the callback subprogno is not the same on > each invocation of the instruction, then we could represent that by > just setting is_valid to false. Agree, I'll update the declaration as follows: struct bpf_loop_inline_state { bool initialized; /* set to true upon first entry */ bool can_inline; /* true if callback function * is the same at each call * and flags are always zero */ u32 callback_subprogno; /* valid when can_inline is true */ /* stack offset for loop vars; * u16 since bpf_subprog_info.stack_depth is u16; * we take the negative of it whenever we use it * since the stack grows downwards */ u16 stack_base; }; > Is placing this inside the union in "struct bpf_insn_aux_data" an option? This is an option indeed, will do. > > + > > + if (fit_for_bpf_loop_inline(aux)) { > > + if (!subprog_updated) { > > + subprog_updated = true; > > + subprogs[cur_subprog].stack_depth += BPF_REG_SIZE * 3; > > + stack_base = -subprogs[cur_subprog].stack_depth; > > + } > > + aux->loop_inline_state.stack_base = stack_base; > > + } > > + if (i == subprog_end - 1) { > > + subprog_updated = false; > > + cur_subprog++; > > + if (cur_subprog < env->subprog_cnt) > > + subprog_end = subprogs[cur_subprog + 1].start; > > + } > > + } > > + > > + env->prog->aux->stack_depth = env->subprog_info[0].stack_depth; > > In the case where a subprogram that is not subprogram 0 is a fit for > the bpf loop inline and thus increases its stack depth, won't > env->prog->aux->stack_depth need to also be updated? As far as I understand the logic in `do_check_main` and `jit_subprogs` `env->prog->aux->stack_depth` always reflects the stack depth of the first sub-program (aka `env->subprog_info[0].stack_depth`). So the last line of `adjust_stack_depth_for_loop_inlining` merely ensures this invariant. The stack depth for each relevant subprogram is updated earlier in the function: static void adjust_stack_depth_for_loop_inlining(struct bpf_verifier_env *env) { ... for (i = 0; i < insn_cnt; i++) { ... if (fit_for_bpf_loop_inline(aux)) { if (!subprog_updated) { subprog_updated = true; here ----> subprogs[cur_subprog].stack_depth += BPF_REG_SIZE * 3; ... } ... } if (i == subprog_end - 1) { subprog_updated = false; cur_subprog++; ... } } ... } Also, the patch v3 4/5 in a series has a test case "bpf_loop_inline stack locations for loop vars" which checks that stack offsets for spilled registers are assigned correctly for subprogram that is not a first subprogram. > > @@ -15030,6 +15216,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr) > > if (ret == 0) > > ret = check_max_stack_depth(env); > > > > + if (ret == 0) > > + adjust_stack_depth_for_loop_inlining(env); > > Do we need to do this before the check_max_stack_depth() call above > since adjust_stack_depth_for_loop_inlining() adjusts the stack depth? This is an interesting question. I used the following reasoning placing `adjust_stack_depth_for_loop_inlining` after the `check_max_stack_depth`: 1. If `adjust_stack_depth_for_loop_inlining` is placed before `check_max_stack_depth` some of the existing BPF programs might stop loading, because of increased stack usage. 2. To avoid (1) it is possible to do `check_max_stack_depth` first, remember actual max depth and apply `adjust_stack_depth_for_loop_inlining` only when there is enough stack space. However there are two downsides: - the optimization becomes flaky, similarly simple changes to the code of the BPF program might cause loop inlining to stop working; - the precise verification itself is non-trivial as each possible stack trace has to be analyzed in terms of stack size with loop inlining / stack size without loop inlining. If necessary, I will probably use a simpler heuristic where stack budget for register spilling would be computed as `MAX_BPF_STACK - actual_max_stack_depth` 3. Things are simpler if MAX_BPF_STACK is a soft limit that ensures that BPF programs consume limited amount of stack. In such case current implementation is sufficient. So this boils down to the question what is `MAX_BPF_STACK`: - a hard limit for the BPF program stack size? - a soft limit used to verify that programs consume a limited amount of stack while executing? If the answer is "hard limit" I'll proceed with implementation for option (2). Thanks, Eduard