Re: [RFC bpf-next v2 2/9] bpf: no_caller_saved_registers attribute for helper calls

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 10 Jul 2024 11:49:35 -0700

On Wed, Jul 10, 2024 at 11:41 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Wed, 2024-07-10 at 10:50 -0700, Andrii Nakryiko wrote:
>
> [...]
>
> > > > Ok, I see, this wasn't obvious that you want this behavior. I actually
> > > > find it surprising (and so at least let's call this possibility out).
> > > >
> > > > But, tbh, I'd make this stricter. I'd dictate that within a subprogram
> > > > all no_csr patterns *should* end with the same stack offset and I
> > > > would check for that (so what I mentioned before that instead of min
> > > > or max we need assignment and check for equality if we already
> > > > assigned it once).
> > >
> > > This makes sense, but does not cover a theoretical corner case:
> > > - suppose there are two nocsr functions A and B on the kernel side,
> > >   but only was A marked as nocsr during program compilation;
> > > - the spill/fill "bracket" was generated for a call to function B by
> > >   the compiler (because this is just a valid codegen).
> >
> > In this case the call to A shouldn't be changing nocsr_offset at all,
> > though? You should find no spill/fill and thus even though A is
> > *allowed* to use no_csr, it doesn't, so it should have no effect on
> > nocsr offsets, no?
>
> Consider the following code:
>
>     *(u64 *)(r10 - 24) = r1;
>     call A                     // kernel and clang know that A is nocsr
>     r1 = *(u64 *)(r10 - 24);   // kernel assumes .nocsr_stack_offset -24
>     ...
>     *(u64 *)(r10 - 16) = r1;
>     call B                     // only kernel knows that B is nocsr
>     r1 = *(u64 *)(r10 - 16);   // with stricter logic this would disable
>                                // nocsr for the whole sub-probram

oh, you mean that r10-16 accesses are valid instructions emitted by
the compiler but not due to nocsr? I mean, tough luck?... We shouldn't
reject this, but nocsr is ultimately a performance optimization, so
not critical if it doesn't work within some subprogram.

>
> > But your example actually made me think about another (not theoretical
> > at all) case. What if we have calls to A and B, the kernel is slightly
> > old and knows that B is nocsr, but A is not. But the BPF program was
> > compiled with the latest helper/kfunc definitions marking A as no_csr
> > eligible (as well as B). (btw, THAT'S THE WORD for allow_csr --
> > ELIGIBLE. csr_eligible FTW! but I digress...)
> >
> > With the case above we'll disable nocsr for both A and B, right? That
> > sucks, but not sure if we can do anything about that. (We can probably
> > assume no_csr pattern and thus allow spill/fill and not disable nocsr
> > in general, but not remove spill/fills... a bit more complication
> > upfront for longer term extensibility.. not sure, maybe performance
> > regression is a fine price, hmm)
> >
> > So I take it back about unmarking csr in the *entire BPF program*,
> > let's just limit it to the subprog scope. But I still think we should
> > do it eagerly, rather than double checking in do_misc_followups().
> > WDYT?
>
> With current algo this situation would disable nocsr indeed.
> The problem is that checks for spilled stack slots usage is too simplistic.
> However, it could be covered if the check is performed using e.g.
> process I described earlier:
> - for spill, remember the defining instruction in the stack slot structure;
> - for fill, "merge" the defining instruction index;
> - for other stack access mark defining instruction as escaping.

Sorry, I have no idea what the above means and implies. "defining
instruction", "merge", "escaping"

As I mentioned above, I'd keep it as simple as reasonably possible.

>
> > > So I wanted to keep the nocsr check a little bit more permissive.
> > >
> > > > Also, instead of doing that extra nocsr offset check in
> > > > do_misc_fixups(), why don't we just reset all no_csr patterns within a
> > > > subprogram *if we find a single violation*. Compiler shouldn't ever
> > > > emit such code, right? So let's be strict and fallback to not
> > > > recognizing nocsr.
> > > >
> > > > And then we won't need that extra check in do_misc_fixups() because we
> > > > eagerly unset no_csr flag and will never hit that piece of logic in
> > > > patching.
> > >
> > > I can do that, but the detector pass would have to be two pass:
> > > - on the first pass, find the nocsr_stack_off, add candidate insn marks;
> > > - on the second pass, remove marks from insns with wrong stack access offset.
> >
> > It's not really a second pass, it's part of normal validation.
> > check_nocsr_stack_contract() will detect this and will do, yes, pass
> > over all instructions of a subprogram to unmark them.
>
> - on the first pass true .nocsr_stack_off is not yet known,
>   so .nocsr_pattern is set optimistically;
> - on the second pass .nocsr_stack_off is already known,
>   so .nocsr_pattern can be removed from spills/fills outside the range;
> - check_nocsr_stack_contract() does not need to scan full sub-program
>   on each violation, it can set a flag disabling nocsr in subprogram info.

I'm not even sure anymore if we are agreeing or disagreeing, let's
just look at the next revision or something, I'm getting lost.

I was saying that as soon as check_nocsr_stack_contract() detects that
contract is breached, we eagerly set nocsr_pattern and
nocsr_spills_num to 0 for all instructions within the current subprog
and setting nocsr_stack_off for subprog info to 0 (there is no more
csr). There will be no other violations after that within that
subprog. Maybe that's what you mean as well and I just can't read.

>
> > > Otherwise I can't discern true patterns from false positives in
> > > situation described above.
> >
> > See above, I might be missing what your "theoretical" example changes.
> >
> > >
> > > > I'd go even further and say that on any nocsr invariant violation, we
> > > > just go and reset all nocsr pattern flags across entire BPF program
> > > > (all subprogs including). In check_nocsr_stack_contract() I mean. Just
> > > > have a loop over all instructions and set that flag to false.
> > > >
> > > > Why not? What realistic application would need to violate nocsr in
> > > > some subprogs but not in others?
> > > >
> > > > KISS or it's too simplistic for some reason?
> > >
> > > I can invalidate nocsr per-program.
> > > If so, I would use an "allow_nocsr" flag in program aux to avoid
> > > additional passes over instructions.
> >
> > yep, see above (but let's keep it per-subprog for now)
>
> Ok
>
> > > > > > > bpf_patch_insn_data() assumes that one instruction has to be replaced with many.
> > > > > > > Here I need to replace many instructions with a single instruction.
> > > > > > > I'd prefer not to tweak bpf_patch_insn_data() for this patch-set.
> > > > > >
> > > > > > ah, bpf_patch_insn_data just does bpf_patch_insn_single, somehow I
> > > > > > thought that it does range for range replacement of instructions.
> > > > > > Never mind then (it's a bit sad that we don't have a proper flexible
> > > > > > and powerful patching primitive, though, but oh well).
> > > > >
> > > > > That is true.
> > > > > I'll think about it once other issues with this patch-set are resolved.
> > > >
> > > > yeah, it's completely unrelated, but having a bpf_patch_insns that
> > > > takes input instruction range to be replaced and replacing it with
> > > > another set of instructions would cover all existing use cases and
> > > > some more. We wouldn't need verifier_remove_insns(), because that's
> > > > just replacing existing instructions with empty new set of
> > > > instructions. We would now have "insert instructions" primitive if we
> > > > specify empty input range of instructions. If we add a flag whether to
> > > > adjust jump offsets and make it configurable, we'd need the thing that
> > > > Alexei needed for may_goto without any extra hacks.
> > > >
> > > > Furthermore, I find it wrong and silly that we keep having manual
> > > > delta+insn adjustments in every single piece of patching logic. I
> > > > think this should be done by bpf_patch_insns(). We need to have a
> > > > small "insn_patcher" struct that we use during instruction patching,
> > > > and a small instruction iterator on top that would keep returning next
> > > > instruction to process (and its index, probably). This will allow us
> > > > to optimize patching and instead of doing O(N) we can have something
> > > > faster and smarter (if we hide direct insn array accesses during
> > > > patching).
> > > >
> > > > But this is all a completely separate story from all of this.
> > >
> > > I actually suggested something along these lines very long time ago:
> > > https://lore.kernel.org/bpf/5c297e5009bcd4f0becf59ccd97cfd82bb3a49ec.camel@xxxxxxxxx/
> >
> > We had a few discussions around patching improvements (you can see I
> > mentioned that in that thread as well). We all like to talk about
> > improving this, but not really doing anything about this. :) Tech debt
> > at its best.
>
> If you are not completely opposed to that idea I can follow-up with a
> complete version as an RFC.

let's do one thing at a time, there is plenty to review for nocsr
without having to switch to patching.