Re: [RFC bpf-next v2 2/9] bpf: no_caller_saved_registers attribute for helper calls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 10, 2024 at 12:57 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Tue, 2024-07-09 at 23:01 -0700, Andrii Nakryiko wrote:
>
> [...]
>
> > > Right, it should be '<', my bad, will update the comment.
> >
> > See, I do read comments ;)
>
> Yes, sorry about that remark.
>

no worries :)

> [...]
>
> > > > > +/* See comment for mark_nocsr_pattern_for_call() */
> > > > > +static void check_nocsr_stack_contract(struct bpf_verifier_env *env, struct bpf_func_state *state,
> > > > > +                                      int insn_idx, int off)
> > > > > +{
> > > > > +       struct bpf_subprog_info *subprog = &env->subprog_info[state->subprogno];
> > > > > +       struct bpf_insn_aux_data *aux = &env->insn_aux_data[insn_idx];
> > > > > +
> > > > > +       if (subprog->nocsr_stack_off <= off || aux->nocsr_pattern)
> > > > > +               return;
> > > >
> > > > can helper call instruction go through this check? E.g., if we do
> > > > bpf_probe_read_kernel() into stack slot, where do we check that that
> > > > slot is not overlapping with nocsr spill/fill region?
> > >
> > > In check_helper_call() we do check_mem_access() that eventually calls
> > > one of the check_stack_{read,write}_{fixed,varying}_off().
> > > The .access_size should be set for bpf_probe_read_kernel()
> > > because it's argument base type is ARG_PTR_TO_MEM.
> > > I will add a test case to double-check this.
> >
> > Ok. Also as I was reading this I didn't yet realize that
> > aux->nocsr_pattern is not set for call instruction itself, so I was
> > worried that we might accidentally skip the check. But we don't set
> > nocsr_pattern for call, so we should be good (but the test never
> > hurts)
>
> Well yes, there is a flag check, but call to bpf_probe_read_kernel()
> touching candidate nocsr stack address should disable nocsr rewrites
> for current subprogram.

cool, let's have a test confirming correctness

>
> [...]
>
> > > > > +static u32 call_csr_mask(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > > >
> > > > you use u8 for get_helper_reg_mask() and u32 here, why not keep them consistent?
> > >
> > > Ok
> > >
> > > > similar to the naming nit above, I think we should be a bit more
> > > > explicit with what "mask" actually means. Is this also clobber mask?
> > >
> > > I mean, there is a comment right above the function.
> > > This function returns a mask of caller saved registers (csr).
> > > I'll make the name more explicit.
> > >
> > > >
> > > > > +{
> > > > > +       const struct bpf_func_proto *fn;
> > > > > +
> > > > > +       if (bpf_helper_call(insn) &&
> > > > > +           (verifier_inlines_helper_call(env, insn->imm) || bpf_jit_inlines_helper_call(insn->imm)) &&
> > > > > +           get_helper_proto(env, insn->imm, &fn) == 0 &&
> > > > > +           fn->allow_nocsr)
> > > > > +               return ~get_helper_reg_mask(fn);
> > > >
> > > > hm... I'm a bit confused why we do a negation here? aren't we working
> > > > with clobbering mask... I'll keep reading for now.
> > >
> > > Please read the comment before the function.
> > >
> >
> > Believe it or not, but I read it like 10 times and that didn't help me much.
> >
> > +/* If 'insn' is a call that follows no_caller_saved_registers contract
> > + * and called function is inlined by current jit or verifier,
> > + * return a mask with 1s corresponding to registers that are scratched
> > + * by this call (depends on return type and number of return parameters).
> > + * Otherwise return ALL_CALLER_SAVED_REGS mask.
> >
> > "registers that are scratched by this call" would be what
> > get_helper_reg_mask() returns for the function (at least that's my
> > reading of the above), and yet you return inverted mask. It doesn't
> > really help that we return ALL_CALLER_SAVED_REGS as "nope, it's not
> > nocsr call".
>
> I see, I messed up the comment...
>
> > Maybe it would be a bit easier to follow if call_csr_mask() returned
> > bool and mask as an out parameter in case it's nocsr call. So instead
> > of special casing ALL_CALLER_SAVED_REGS there would be a nice "not a
> > nocsr, never mind" and early exit.
>
> Out parameter seems to be an overkill. I'll change names a little bit
> to make it easier to follow. If for v3 you'd still think that out
> parameter is better I'll switch then.

ok

>
> > Anyways, perhaps I'm just being very dense here, I just found this
> > particular piece extremely hard to follow intuitively.
> >
> > > >
> > > > > +
> > > > > +       return ALL_CALLER_SAVED_REGS;
> > > > > +}
> > >
> > > [...]
> > >
> > > > > +static void mark_nocsr_pattern_for_call(struct bpf_verifier_env *env, int t)
> > > > > +{
> > > > > +       struct bpf_insn *insns = env->prog->insnsi, *stx, *ldx;
> > > > > +       struct bpf_subprog_info *subprog;
> > > > > +       u32 csr_mask = call_csr_mask(env, &insns[t]);
> > > > > +       u32 reg_mask = ~csr_mask | ~ALL_CALLER_SAVED_REGS;
> > > >
> > > > tbh, I'm lost with all these bitmask and their inversions...
> > > > call_csr_mask()'s result is basically always used inverted, so why not
> > > > return inverted mask in the first place?
> > >
> > > The mask is initialized as a set of all registers preserved by this call.
> >
> > ok, maybe it's just a mix of "no csr" and "csr" that confuses me. How
> > about we call call_csr_mask as get_helper_preserved_mask() or
> > something like that to "match" get_helper_clobber_mask()?
>
> Yes, these two names are good, thank you.
> I'd still call it get_call_preserved_mask() as kfuncs might be added
> at some point.

sure, makes sense

>
> > > Those that are not in mask need a spill/fill pair.
> > > I'll toss things around to make this more clear.
> > > (naming, comments, maybe move the '| ~ALL_CALLER_SAVED_REGS' to the call_csr_mask()).
> > >
> > > >
> > > > > +       int s, i;
> > > > > +       s16 off;
> > > > > +
> > > > > +       if (csr_mask == ALL_CALLER_SAVED_REGS)
> > > > > +               return;
> > > > > +
> > > > > +       for (i = 1, off = 0; i <= ARRAY_SIZE(caller_saved); ++i, off += BPF_REG_SIZE) {
> > > > > +               if (t - i < 0 || t + i >= env->prog->len)
> > > > > +                       break;
> > > > > +               stx = &insns[t - i];
> > > > > +               ldx = &insns[t + i];
> > > > > +               if (off == 0) {
> > > > > +                       off = stx->off;
> > > > > +                       if (off % BPF_REG_SIZE != 0)
> > > > > +                               break;
> > > >
> > > > kind of ugly that we assume stx before we actually checked that it's
> > > > STX?... maybe split humongous if below into instruction checking
> > > > (with code and src_reg) and then off checking separately?
> > >
> > > Don't see anything ugly about this, tbh.
> >
> > you are using stx->off and do `% BPF_REG_SIZE` check on it while that
> > stx->off field might be meaningless for the instruction (because we
> > are not yet sure it's STX instruction), that's a bit ugly, IMO
>
> I can rewrite it like below:
>
>                 if (stx->code != (BPF_STX | BPF_MEM | BPF_DW) ||
>                     ldx->code != (BPF_LDX | BPF_MEM | BPF_DW))

I'd add stx->dst_reg != BPF_REG_10 and ldx->src_reg != BPF_REG_10
checks here and preserve original comments with instruction assembly
form.

(think about this as establishing that we are looking at the correct
shapes of instructions)

>                         break;
>                 off = off != 0 ?: stx->off; // or use full 'if' block from the old version
>                 if (stx->dst_reg != BPF_REG_10 ||
>                     ldx->src_reg != BPF_REG_10 ||
>                     /* check spill/fill for the same reg and offset */
>                     stx->src_reg != ldx->dst_reg ||
>                     stx->off != ldx->off ||
>                     stx->off != off ||
>                     (off % BPF_REG_SIZE) != 0 ||
>                     /* this should be a previously unseen register */
>                     (BIT(stx->src_reg) & reg_mask))

and here we are checking all the correctness and additional imposed
semantical invariants knowing full well that we are dealing with
correct STX/LDX shapes

>                         break;
>
> But I'm not sure this adds any actual clarity.

I think it makes sense.

>
> >
> > > Can split the 'if' statement, if you think it's hard to read.
> >
> > it's not about readability for me
> >
> > >
> > > >
> > > > > +               }
> > > > > +               if (/* *(u64 *)(r10 - off) = r[0-5]? */
> > > > > +                   stx->code != (BPF_STX | BPF_MEM | BPF_DW) ||
> > > > > +                   stx->dst_reg != BPF_REG_10 ||
> > > > > +                   /* r[0-5] = *(u64 *)(r10 - off)? */
> > > > > +                   ldx->code != (BPF_LDX | BPF_MEM | BPF_DW) ||
> > > > > +                   ldx->src_reg != BPF_REG_10 ||
> > > > > +                   /* check spill/fill for the same reg and offset */
> > > > > +                   stx->src_reg != ldx->dst_reg ||
> > > > > +                   stx->off != ldx->off ||
> > > > > +                   stx->off != off ||
> > > > > +                   /* this should be a previously unseen register */
> > > > > +                   BIT(stx->src_reg) & reg_mask)
> > > >
> > > > () around & operation?
> > >
> > > No need, & has higher priority over ||.
> > > You can check the AST using
> > > https://tree-sitter.github.io/tree-sitter/playground .
> >
> > Oh, I have no doubt you checked that it works correctly. It's just not
> > a really good habit to rely on C's obscure operation ordering rules
> > beyond A && B || C (and even then it is arguable). I think the rule of
> > thumb to not mix bitwise and logic operators without explicit
> > parenthesis makes sense.
> >
> > But never mind, I already feel weird complaining about !strcmp(),
> > every real kernel engineer should memorize operator precedence by
> > heart.
>
> I assumed you implied incorrect evaluation order.
> Will add parenthesis.
>

thanks

> > > > > +                       break;
> > > > > +               reg_mask |= BIT(stx->src_reg);
> > > > > +               env->insn_aux_data[t - i].nocsr_pattern = 1;
> > > > > +               env->insn_aux_data[t + i].nocsr_pattern = 1;
> > > > > +       }
> > > > > +       if (i == 1)
> > > > > +               return;
> > > > > +       env->insn_aux_data[t].nocsr_spills_num = i - 1;
> > > > > +       s = find_containing_subprog(env, t);
> > > > > +       /* can't happen */
> [...]
> > > > > +       if (WARN_ON_ONCE(s < 0))
> > > > > +               return;
> > > > > +       subprog = &env->subprog_info[s];
> > > > > +       subprog->nocsr_stack_off = min(subprog->nocsr_stack_off, off);
> > > >
> > > > should this be max()? offsets are negative, right? so if nocsr uses -8
> > > > and -16 as in the example, entire [-16, 0) region is nocsr region
> > >
> > > This should be min exactly because stack offsets are negative.
> > > For the example above the 'off' is initialized as -16 and then
> > > is incremented by +8 giving final value of -8.
> > > And I need to select the minimal value used between several patterns.
> >
> > so let's say I have two nocsr calls in the same subprog
> >
> >     *(u64 *)(r10 - 8)  = r1;
> >     *(u64 *)(r10 - 16) = r2;
> >     call %[to_be_inlined]
> >     r2 = *(u64 *)(r10 - 16);
> >     r1 = *(u64 *)(r10 - 8);
> >
> >
> > and then a bit later
> >
> >
> >     *(u64 *)(r10 - 16)  = r1;
> >     *(u64 *)(r10 - 24) = r2;
> >     call %[to_be_inlined]
> >     r2 = *(u64 *)(r10 - 24);
> >     r1 = *(u64 *)(r10 - 16);
> >
> >
> > For the first chunk nocsr range is [-16, 0). For the second it's [-24,
> > -8), right?
> > Should `*(u64 *)(r10 - 8) = 123` somewhere in that subprog (not for
> > nocsr calls) invalidate this whole nocsr thing? With min you'll
> > basically have [-24, -8) as nocsr-reserved region, but shouldn't it be
> > whole [-24, 0)?
> >
> > Clang shouldn't generate such inconsistent offset, right? But it's
> > legal, no? And if not, then all the calls should use the same upper
> > stack boundary and we shouldn't be doing min/max, but rather checking
> > that they are all consistent.
> >
> > Or what am I missing again?
>
> You don't miss anything.
>
> With the current algorithm first call will not be rewritten because of
> the offset check in do_misc_fixups().
> The second call would be rewritten and this is safe to do, because
> verifier knows that range [-24,-8) is only used by nocsr patterns.
> What you suggest is slightly more pessimistic, compared to current
> algorithm.

Ok, I see, this wasn't obvious that you want this behavior. I actually
find it surprising (and so at least let's call this possibility out).

But, tbh, I'd make this stricter. I'd dictate that within a subprogram
all no_csr patterns *should* end with the same stack offset and I
would check for that (so what I mentioned before that instead of min
or max we need assignment and check for equality if we already
assigned it once).

Also, instead of doing that extra nocsr offset check in
do_misc_fixups(), why don't we just reset all no_csr patterns within a
subprogram *if we find a single violation*. Compiler shouldn't ever
emit such code, right? So let's be strict and fallback to not
recognizing nocsr.

And then we won't need that extra check in do_misc_fixups() because we
eagerly unset no_csr flag and will never hit that piece of logic in
patching.


I'd go even further and say that on any nocsr invariant violation, we
just go and reset all nocsr pattern flags across entire BPF program
(all subprogs including). In check_nocsr_stack_contract() I mean. Just
have a loop over all instructions and set that flag to false.

Why not? What realistic application would need to violate nocsr in
some subprogs but not in others?

KISS or it's too simplistic for some reason?

>
> [...]
>
> > > > > +
> > > > > +                       /* apply the rewrite:
> > > > > +                        *   *(u64 *)(r10 - X) = rY ; num-times
> > > > > +                        *   call()                               -> call()
> > > > > +                        *   rY = *(u64 *)(r10 - X) ; num-times
> > > > > +                        */
> > > > > +                       err = verifier_remove_insns(env, i + delta - spills_num, spills_num);
> > > > > +                       if (err)
> > > > > +                               return err;
> > > > > +                       err = verifier_remove_insns(env, i + delta - spills_num + 1, spills_num);
> > > > > +                       if (err)
> > > > > +                               return err;
> > > >
> > > > why not a single bpf_patch_insn_data()?
> > >
> > > bpf_patch_insn_data() assumes that one instruction has to be replaced with many.
> > > Here I need to replace many instructions with a single instruction.
> > > I'd prefer not to tweak bpf_patch_insn_data() for this patch-set.
> >
> > ah, bpf_patch_insn_data just does bpf_patch_insn_single, somehow I
> > thought that it does range for range replacement of instructions.
> > Never mind then (it's a bit sad that we don't have a proper flexible
> > and powerful patching primitive, though, but oh well).
>
> That is true.
> I'll think about it once other issues with this patch-set are resolved.

yeah, it's completely unrelated, but having a bpf_patch_insns that
takes input instruction range to be replaced and replacing it with
another set of instructions would cover all existing use cases and
some more. We wouldn't need verifier_remove_insns(), because that's
just replacing existing instructions with empty new set of
instructions. We would now have "insert instructions" primitive if we
specify empty input range of instructions. If we add a flag whether to
adjust jump offsets and make it configurable, we'd need the thing that
Alexei needed for may_goto without any extra hacks.

Furthermore, I find it wrong and silly that we keep having manual
delta+insn adjustments in every single piece of patching logic. I
think this should be done by bpf_patch_insns(). We need to have a
small "insn_patcher" struct that we use during instruction patching,
and a small instruction iterator on top that would keep returning next
instruction to process (and its index, probably). This will allow us
to optimize patching and instead of doing O(N) we can have something
faster and smarter (if we hide direct insn array accesses during
patching).

But this is all a completely separate story from all of this.

>
> [...]
>
> > > > > +
> > > > > +                       i += spills_num - 1;
> > > > > +                       /*   ^            ^   do a second visit of this instruction,
> > > > > +                        *   |            '-- so that verifier can inline it
> > > > > +                        *   '--------------- jump over deleted fills
> > > > > +                        */
> > > > > +                       delta -= 2 * spills_num;
> > > > > +                       insn = env->prog->insnsi + i + delta;
> > > > > +                       goto next_insn;
> > > >
> > > > why not adjust the state and just fall through, what goto next_insn
> > > > does that we can't (and next instruction is misleading, so I'd rather
> > > > fix up and move forward)
> > >
> > > I don't like this. The fall-through makes control flow more convoluted.
> > > To understand what would happen next:
> > > - with goto next_insn we just start over;
> > > - with fall-through we need to think about position of this particular
> > >   'if' statement within the loop.
> > >
> >
> > Alright, it's fine. Though I don't see anything wrong with knowing
> > that we patch up nocsr pattern before we do inlining of calls. And
> > yes, order is important, so what? I always assumed that the order of
> > operations in do_misc_fixups() is not arbitrary.
>
> On a second thought maybe fall-through is not that bad.
>
> [...]





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux