On Mon, 2024-07-01 at 17:41 -0700, Andrii Nakryiko wrote: [...] > > - clang generates a simple pattern for nocsr calls, e.g.: > > > > r1 = 1; > > r2 = 2; > > *(u64 *)(r10 - 8) = r1; > > *(u64 *)(r10 - 16) = r2; > > call %[to_be_inlined_by_jit] > > "inline_by_jit" is misleading, it can be inlined by BPF verifier using > BPF instructions, not just by BPF JIT Will change > > r2 = *(u64 *)(r10 - 16); > > r1 = *(u64 *)(r10 - 8); > > r0 = r1; > > r0 += r2; > > exit; [...] > > > -static int find_subprog(struct bpf_verifier_env *env, int off) > > +/* Find subprogram that contains instruction at 'off' */ > > +static int find_containing_subprog(struct bpf_verifier_env *env, int off) > > { > > - struct bpf_subprog_info *p; > > + struct bpf_subprog_info *vals = env->subprog_info; > > + int high = env->subprog_cnt - 1; > > + int low = 0, ret = -ENOENT; > > > > - p = bsearch(&off, env->subprog_info, env->subprog_cnt, > > - sizeof(env->subprog_info[0]), cmp_subprogs); > > - if (!p) > > + if (off >= env->prog->len || off < 0) > > return -ENOENT; > > - return p - env->subprog_info; > > > > + while (low <= high) { > > + int mid = (low + high)/2; > > styling nit: (...) / 2 > > > + struct bpf_subprog_info *val = &vals[mid]; > > + int diff = off - val->start; > > + > > + if (diff < 0) { > > tbh, this hurts my brain. Why not write human-readable and more meaningful > > if (off < val->start) > > ? No reason, will change. > > + high = mid - 1; > > + } else { > > + low = mid + 1; > > + /* remember last time mid.start <= off */ > > + ret = mid; > > + } > > feel free to ignore, but I find this unnecessary `ret = mid` part a > bit inelegant. See find_linfo in kernel/bpf/log.c for how > lower_bound-like binary search could be implemented without this (I > mean the pattern where invariant keeps low or high as always > satisfying the condition and the other one being adjusted with +1 or > -1, depending on desired logic). Will steal the code from there, thank you. [...] > > +static u8 get_helper_reg_mask(const struct bpf_func_proto *fn) > > +{ > > + u8 mask; > > + int i; > > + > > + if (!fn->nocsr) > > + return ALL_CALLER_SAVED_REGS; > > + > > + mask = 0; > > + mask |= fn->ret_type == RET_VOID ? 0 : BIT(BPF_REG_0); > > + for (i = 0; i < ARRAY_SIZE(fn->arg_type); ++i) > > + mask |= fn->arg_type[i] == ARG_DONTCARE ? 0 : BIT(BPF_REG_1 + i); > > again subjective, but > > if (fn->ret_type != RET_VOID) > mask |= BIT(BPF_REG_0); > > (and similarly for ARG_DONTCARE) > > seems a bit more readable and not that much more verbose Sure, will change. [...] > > +static bool verifier_inlines_helper_call(struct bpf_verifier_env *env, s32 imm) > > +{ > > note that there is now also bpf_jit_inlines_helper_call() There is. I'd like authors of jit inline patches to explicitly opt-in for nocsr, vouching that inline patch follows nocsr contract. The idea is that people would extend call_csr_mask() as necessary. > > > > + return false; > > +} > > + > > +/* If 'insn' is a call that follows no_caller_saved_registers contract > > + * and called function is inlined by current jit, return a mask with > > + * 1s corresponding to registers that are scratched by this call > > + * (depends on return type and number of return parameters). > > + * Otherwise return ALL_CALLER_SAVED_REGS mask. > > + */ > > +static u32 call_csr_mask(struct bpf_verifier_env *env, struct bpf_insn *insn) > > +{ > > + const struct bpf_func_proto *fn; > > + > > + if (bpf_helper_call(insn) && > > + verifier_inlines_helper_call(env, insn->imm) && > > strictly speaking, does nocsr have anything to do with inlining, > though? E.g., if we know for sure (however, that's a separate issue) > that helper implementation doesn't touch extra registers, why do we > need inlining to make use of nocsr? Technically, alternative for nocsr is for C version of the helper/kfunc itself has no_caller_saved_registers attribute. Grep shows a single function annotated as such in kernel tree: stackleak_track_stack(). Or, maybe, for helpers/kfuncs implemented in assembly. Whenever such helpers/kfuncs are added, this function could be extended. > > > + get_helper_proto(env, insn->imm, &fn) == 0 && > > + fn->nocsr) > > + return ~get_helper_reg_mask(fn); > > + > > + return ALL_CALLER_SAVED_REGS; > > +} > > + [...] > > +static int match_and_mark_nocsr_pattern(struct bpf_verifier_env *env, int t, bool mark) > > +{ > > + struct bpf_insn *insns = env->prog->insnsi, *stx, *ldx; > > + struct bpf_subprog_info *subprog; > > + u32 csr_mask = call_csr_mask(env, &insns[t]); > > + u32 reg_mask = ~csr_mask | ~ALL_CALLER_SAVED_REGS; > > + int s, i; > > + s16 off; > > + > > + if (csr_mask == ALL_CALLER_SAVED_REGS) > > + return false; > > false -> 0 ? Right. > > > + > > + for (i = 1, off = 0; i <= ARRAY_SIZE(caller_saved); ++i, off += BPF_REG_SIZE) { > > + if (t - i < 0 || t + i >= env->prog->len) > > + break; > > + stx = &insns[t - i]; > > + ldx = &insns[t + i]; > > + if (off == 0) { > > + off = stx->off; > > + if (off % BPF_REG_SIZE != 0) > > + break; > Makes sense. > > + } > > + if (/* *(u64 *)(r10 - off) = r[0-5]? */ > > + stx->code != (BPF_STX | BPF_MEM | BPF_DW) || > > + stx->dst_reg != BPF_REG_10 || > > + /* r[0-5] = *(u64 *)(r10 - off)? */ > > + ldx->code != (BPF_LDX | BPF_MEM | BPF_DW) || > > + ldx->src_reg != BPF_REG_10 || > > + /* check spill/fill for the same reg and offset */ > > + stx->src_reg != ldx->dst_reg || > > + stx->off != ldx->off || > > + stx->off != off || > > + /* this should be a previously unseen register */ > > + BIT(stx->src_reg) & reg_mask) > > + break; > > + reg_mask |= BIT(stx->src_reg); > > + if (mark) { > > + env->insn_aux_data[t - i].nocsr_pattern = true; > > + env->insn_aux_data[t + i].nocsr_pattern = true; > > + } > > + } > > + if (i == 1) > > + return 0; > > + if (mark) { > > + s = find_containing_subprog(env, t); > > + /* can't happen */ > > + if (WARN_ON_ONCE(s < 0)) > > + return 0; > > + subprog = &env->subprog_info[s]; > > + subprog->nocsr_stack_off = min(subprog->nocsr_stack_off, off); > > + } > > why not split pattern detection and all this other marking logic? You > can return "the size of csr pattern", meaning how many spills/fills > are there surrounding the call, no? Then all the marking can be done > (if necessary) by the caller. I'll try recording pattern size for function call, so no split would be necessary. > The question is what to do with zero patter (no spills/fills for nocsr > call, is that valid case?) Zero pattern -- no spills/fills to remove, so nothing to do. I will add a test case, but it should be handled by current implementation fine. [...] > > +/* Remove unnecessary spill/fill pairs, members of nocsr pattern. > > + * Do this as a separate pass to avoid interfering with helper/kfunc > > + * inlining logic in do_misc_fixups(). > > + * See comment for match_and_mark_nocsr_pattern(). > > + */ > > +static int remove_nocsr_spills_fills(struct bpf_verifier_env *env) > > +{ > > + struct bpf_subprog_info *subprogs = env->subprog_info; > > + int i, j, spills_num, cur_subprog = 0; > > + struct bpf_insn *insn = env->prog->insnsi; > > + int insn_cnt = env->prog->len; > > + > > + for (i = 0; i < insn_cnt; i++, insn++) { > > + spills_num = match_nocsr_pattern(env, i); > > we can probably afford a single-byte field somewhere in > bpf_insn_aux_data to remember "csr pattern size" instead of just a > true/false fact that it is nocsr call. And so we wouldn't need to do > pattern matching again here, we'll just have all the data. Makes sense. [...]