Re: [RFC bpf-next v2 2/9] bpf: no_caller_saved_registers attribute for helper calls

Eduard Zingerman <eddyz87@xxxxxxxxx> · Wed, 10 Jul 2024 00:57:40 -0700

On Tue, 2024-07-09 at 23:01 -0700, Andrii Nakryiko wrote:

[...]

> > Right, it should be '<', my bad, will update the comment.
> 
> See, I do read comments ;)

Yes, sorry about that remark.

[...]

> > > > +/* See comment for mark_nocsr_pattern_for_call() */
> > > > +static void check_nocsr_stack_contract(struct bpf_verifier_env *env, struct bpf_func_state *state,
> > > > +                                      int insn_idx, int off)
> > > > +{
> > > > +       struct bpf_subprog_info *subprog = &env->subprog_info[state->subprogno];
> > > > +       struct bpf_insn_aux_data *aux = &env->insn_aux_data[insn_idx];
> > > > +
> > > > +       if (subprog->nocsr_stack_off <= off || aux->nocsr_pattern)
> > > > +               return;
> > > 
> > > can helper call instruction go through this check? E.g., if we do
> > > bpf_probe_read_kernel() into stack slot, where do we check that that
> > > slot is not overlapping with nocsr spill/fill region?
> > 
> > In check_helper_call() we do check_mem_access() that eventually calls
> > one of the check_stack_{read,write}_{fixed,varying}_off().
> > The .access_size should be set for bpf_probe_read_kernel()
> > because it's argument base type is ARG_PTR_TO_MEM.
> > I will add a test case to double-check this.
> 
> Ok. Also as I was reading this I didn't yet realize that
> aux->nocsr_pattern is not set for call instruction itself, so I was
> worried that we might accidentally skip the check. But we don't set
> nocsr_pattern for call, so we should be good (but the test never
> hurts)

Well yes, there is a flag check, but call to bpf_probe_read_kernel()
touching candidate nocsr stack address should disable nocsr rewrites
for current subprogram.

[...]

> > > > +static u32 call_csr_mask(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > > 
> > > you use u8 for get_helper_reg_mask() and u32 here, why not keep them consistent?
> > 
> > Ok
> > 
> > > similar to the naming nit above, I think we should be a bit more
> > > explicit with what "mask" actually means. Is this also clobber mask?
> > 
> > I mean, there is a comment right above the function.
> > This function returns a mask of caller saved registers (csr).
> > I'll make the name more explicit.
> > 
> > > 
> > > > +{
> > > > +       const struct bpf_func_proto *fn;
> > > > +
> > > > +       if (bpf_helper_call(insn) &&
> > > > +           (verifier_inlines_helper_call(env, insn->imm) || bpf_jit_inlines_helper_call(insn->imm)) &&
> > > > +           get_helper_proto(env, insn->imm, &fn) == 0 &&
> > > > +           fn->allow_nocsr)
> > > > +               return ~get_helper_reg_mask(fn);
> > > 
> > > hm... I'm a bit confused why we do a negation here? aren't we working
> > > with clobbering mask... I'll keep reading for now.
> > 
> > Please read the comment before the function.
> > 
> 
> Believe it or not, but I read it like 10 times and that didn't help me much.
> 
> +/* If 'insn' is a call that follows no_caller_saved_registers contract
> + * and called function is inlined by current jit or verifier,
> + * return a mask with 1s corresponding to registers that are scratched
> + * by this call (depends on return type and number of return parameters).
> + * Otherwise return ALL_CALLER_SAVED_REGS mask.
> 
> "registers that are scratched by this call" would be what
> get_helper_reg_mask() returns for the function (at least that's my
> reading of the above), and yet you return inverted mask. It doesn't
> really help that we return ALL_CALLER_SAVED_REGS as "nope, it's not
> nocsr call".

I see, I messed up the comment...

> Maybe it would be a bit easier to follow if call_csr_mask() returned
> bool and mask as an out parameter in case it's nocsr call. So instead
> of special casing ALL_CALLER_SAVED_REGS there would be a nice "not a
> nocsr, never mind" and early exit.

Out parameter seems to be an overkill. I'll change names a little bit
to make it easier to follow. If for v3 you'd still think that out
parameter is better I'll switch then.

> Anyways, perhaps I'm just being very dense here, I just found this
> particular piece extremely hard to follow intuitively.
> 
> > > 
> > > > +
> > > > +       return ALL_CALLER_SAVED_REGS;
> > > > +}
> > 
> > [...]
> > 
> > > > +static void mark_nocsr_pattern_for_call(struct bpf_verifier_env *env, int t)
> > > > +{
> > > > +       struct bpf_insn *insns = env->prog->insnsi, *stx, *ldx;
> > > > +       struct bpf_subprog_info *subprog;
> > > > +       u32 csr_mask = call_csr_mask(env, &insns[t]);
> > > > +       u32 reg_mask = ~csr_mask | ~ALL_CALLER_SAVED_REGS;
> > > 
> > > tbh, I'm lost with all these bitmask and their inversions...
> > > call_csr_mask()'s result is basically always used inverted, so why not
> > > return inverted mask in the first place?
> > 
> > The mask is initialized as a set of all registers preserved by this call.
> 
> ok, maybe it's just a mix of "no csr" and "csr" that confuses me. How
> about we call call_csr_mask as get_helper_preserved_mask() or
> something like that to "match" get_helper_clobber_mask()?

Yes, these two names are good, thank you.
I'd still call it get_call_preserved_mask() as kfuncs might be added
at some point.

> > Those that are not in mask need a spill/fill pair.
> > I'll toss things around to make this more clear.
> > (naming, comments, maybe move the '| ~ALL_CALLER_SAVED_REGS' to the call_csr_mask()).
> > 
> > > 
> > > > +       int s, i;
> > > > +       s16 off;
> > > > +
> > > > +       if (csr_mask == ALL_CALLER_SAVED_REGS)
> > > > +               return;
> > > > +
> > > > +       for (i = 1, off = 0; i <= ARRAY_SIZE(caller_saved); ++i, off += BPF_REG_SIZE) {
> > > > +               if (t - i < 0 || t + i >= env->prog->len)
> > > > +                       break;
> > > > +               stx = &insns[t - i];
> > > > +               ldx = &insns[t + i];
> > > > +               if (off == 0) {
> > > > +                       off = stx->off;
> > > > +                       if (off % BPF_REG_SIZE != 0)
> > > > +                               break;
> > > 
> > > kind of ugly that we assume stx before we actually checked that it's
> > > STX?... maybe split humongous if below into instruction checking
> > > (with code and src_reg) and then off checking separately?
> > 
> > Don't see anything ugly about this, tbh.
> 
> you are using stx->off and do `% BPF_REG_SIZE` check on it while that
> stx->off field might be meaningless for the instruction (because we
> are not yet sure it's STX instruction), that's a bit ugly, IMO

I can rewrite it like below:

		if (stx->code != (BPF_STX | BPF_MEM | BPF_DW) ||
		    ldx->code != (BPF_LDX | BPF_MEM | BPF_DW))
			break;
		off = off != 0 ?: stx->off; // or use full 'if' block from the old version
		if (stx->dst_reg != BPF_REG_10 ||
		    ldx->src_reg != BPF_REG_10 ||
		    /* check spill/fill for the same reg and offset */
		    stx->src_reg != ldx->dst_reg ||
		    stx->off != ldx->off ||
		    stx->off != off ||
		    (off % BPF_REG_SIZE) != 0 ||
		    /* this should be a previously unseen register */
		    (BIT(stx->src_reg) & reg_mask))
			break;

But I'm not sure this adds any actual clarity.

> 
> > Can split the 'if' statement, if you think it's hard to read.
> 
> it's not about readability for me
> 
> > 
> > > 
> > > > +               }
> > > > +               if (/* *(u64 *)(r10 - off) = r[0-5]? */
> > > > +                   stx->code != (BPF_STX | BPF_MEM | BPF_DW) ||
> > > > +                   stx->dst_reg != BPF_REG_10 ||
> > > > +                   /* r[0-5] = *(u64 *)(r10 - off)? */
> > > > +                   ldx->code != (BPF_LDX | BPF_MEM | BPF_DW) ||
> > > > +                   ldx->src_reg != BPF_REG_10 ||
> > > > +                   /* check spill/fill for the same reg and offset */
> > > > +                   stx->src_reg != ldx->dst_reg ||
> > > > +                   stx->off != ldx->off ||
> > > > +                   stx->off != off ||
> > > > +                   /* this should be a previously unseen register */
> > > > +                   BIT(stx->src_reg) & reg_mask)
> > > 
> > > () around & operation?
> > 
> > No need, & has higher priority over ||.
> > You can check the AST using
> > https://tree-sitter.github.io/tree-sitter/playground .
> 
> Oh, I have no doubt you checked that it works correctly. It's just not
> a really good habit to rely on C's obscure operation ordering rules
> beyond A && B || C (and even then it is arguable). I think the rule of
> thumb to not mix bitwise and logic operators without explicit
> parenthesis makes sense.
> 
> But never mind, I already feel weird complaining about !strcmp(),
> every real kernel engineer should memorize operator precedence by
> heart.

I assumed you implied incorrect evaluation order.
Will add parenthesis.

> > > > +                       break;
> > > > +               reg_mask |= BIT(stx->src_reg);
> > > > +               env->insn_aux_data[t - i].nocsr_pattern = 1;
> > > > +               env->insn_aux_data[t + i].nocsr_pattern = 1;
> > > > +       }
> > > > +       if (i == 1)
> > > > +               return;
> > > > +       env->insn_aux_data[t].nocsr_spills_num = i - 1;
> > > > +       s = find_containing_subprog(env, t);
> > > > +       /* can't happen */
[...]
> > > > +       if (WARN_ON_ONCE(s < 0))
> > > > +               return;
> > > > +       subprog = &env->subprog_info[s];
> > > > +       subprog->nocsr_stack_off = min(subprog->nocsr_stack_off, off);
> > > 
> > > should this be max()? offsets are negative, right? so if nocsr uses -8
> > > and -16 as in the example, entire [-16, 0) region is nocsr region
> > 
> > This should be min exactly because stack offsets are negative.
> > For the example above the 'off' is initialized as -16 and then
> > is incremented by +8 giving final value of -8.
> > And I need to select the minimal value used between several patterns.
> 
> so let's say I have two nocsr calls in the same subprog
> 
>     *(u64 *)(r10 - 8)  = r1;
>     *(u64 *)(r10 - 16) = r2;
>     call %[to_be_inlined]
>     r2 = *(u64 *)(r10 - 16);
>     r1 = *(u64 *)(r10 - 8);
> 
> 
> and then a bit later
> 
> 
>     *(u64 *)(r10 - 16)  = r1;
>     *(u64 *)(r10 - 24) = r2;
>     call %[to_be_inlined]
>     r2 = *(u64 *)(r10 - 24);
>     r1 = *(u64 *)(r10 - 16);
> 
> 
> For the first chunk nocsr range is [-16, 0). For the second it's [-24,
> -8), right?
> Should `*(u64 *)(r10 - 8) = 123` somewhere in that subprog (not for
> nocsr calls) invalidate this whole nocsr thing? With min you'll
> basically have [-24, -8) as nocsr-reserved region, but shouldn't it be
> whole [-24, 0)?
> 
> Clang shouldn't generate such inconsistent offset, right? But it's
> legal, no? And if not, then all the calls should use the same upper
> stack boundary and we shouldn't be doing min/max, but rather checking
> that they are all consistent.
> 
> Or what am I missing again?

You don't miss anything.

With the current algorithm first call will not be rewritten because of
the offset check in do_misc_fixups().
The second call would be rewritten and this is safe to do, because
verifier knows that range [-24,-8) is only used by nocsr patterns.
What you suggest is slightly more pessimistic, compared to current
algorithm.

[...]

> > > > +
> > > > +                       /* apply the rewrite:
> > > > +                        *   *(u64 *)(r10 - X) = rY ; num-times
> > > > +                        *   call()                               -> call()
> > > > +                        *   rY = *(u64 *)(r10 - X) ; num-times
> > > > +                        */
> > > > +                       err = verifier_remove_insns(env, i + delta - spills_num, spills_num);
> > > > +                       if (err)
> > > > +                               return err;
> > > > +                       err = verifier_remove_insns(env, i + delta - spills_num + 1, spills_num);
> > > > +                       if (err)
> > > > +                               return err;
> > > 
> > > why not a single bpf_patch_insn_data()?
> > 
> > bpf_patch_insn_data() assumes that one instruction has to be replaced with many.
> > Here I need to replace many instructions with a single instruction.
> > I'd prefer not to tweak bpf_patch_insn_data() for this patch-set.
> 
> ah, bpf_patch_insn_data just does bpf_patch_insn_single, somehow I
> thought that it does range for range replacement of instructions.
> Never mind then (it's a bit sad that we don't have a proper flexible
> and powerful patching primitive, though, but oh well).

That is true.
I'll think about it once other issues with this patch-set are resolved.

[...]

> > > > +
> > > > +                       i += spills_num - 1;
> > > > +                       /*   ^            ^   do a second visit of this instruction,
> > > > +                        *   |            '-- so that verifier can inline it
> > > > +                        *   '--------------- jump over deleted fills
> > > > +                        */
> > > > +                       delta -= 2 * spills_num;
> > > > +                       insn = env->prog->insnsi + i + delta;
> > > > +                       goto next_insn;
> > > 
> > > why not adjust the state and just fall through, what goto next_insn
> > > does that we can't (and next instruction is misleading, so I'd rather
> > > fix up and move forward)
> > 
> > I don't like this. The fall-through makes control flow more convoluted.
> > To understand what would happen next:
> > - with goto next_insn we just start over;
> > - with fall-through we need to think about position of this particular
> >   'if' statement within the loop.
> > 
> 
> Alright, it's fine. Though I don't see anything wrong with knowing
> that we patch up nocsr pattern before we do inlining of calls. And
> yes, order is important, so what? I always assumed that the order of
> operations in do_misc_fixups() is not arbitrary.

On a second thought maybe fall-through is not that bad.

[...]