Re: [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Fri, 7 Apr 2023 01:54:03 +0200

On Thu, Apr 06, 2023 at 04:16:22AM CEST, Alexei Starovoitov wrote:
> On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote:
> >
> > - The exception state is represented using four booleans in the
> >   task_struct of current task. Each boolean corresponds to the exception
> >   state for each kernel context. This allows BPF programs to be
> >   interrupted and still not clobber the other's exception state.
>
> that doesn't work for sleepable bpf progs and in RT for regular progs too.
>

Can you elaborate? If a sleepable program blocks, that means the task is
scheduled out, so the next program will use the other task's task_struct.
Same for preemption for normal progs (under RT or not).

Is there something else that I'm missing?

> > - The other vexing case is of recursion. If a program calls into another
> >   program (e.g. call into helper which invokes tracing program
> >   eventually), it may throw and clobber the current exception state. To
> >   avoid this, an invariant is maintained across the implementation:
> > 	Exception state is always cleared on entry and exit of the main
> > 	BPF program.
> >   This implies that if recursion occurs, the BPF program will clear the
> >   current exception state on entry and exit. However, callbacks do not
> >   do the same, because they are subprograms. The case for propagating
> >   exceptions of callbacks invoked by the kernel back to the BPF program
> >   is handled in the next commit. This is also the main reason to clear
> >   exception state on entry, asynchronous callbacks can clobber exception
> >   state even though we make sure it's always set to be 0 within the
> >   kernel.
> >   Anyhow, the only other thing to be kept in mind is to never allow a
> >   BPF program to execute when the program is being unwinded. This
> >   implies that every function involved in this path must be notrace,
> >   which is the case for bpf_throw, bpf_get_exception and
> >   bpf_reset_exception.
>
> ...
>
> > +			struct bpf_insn entry_insns[] = {
> > +				BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
> > +				BPF_EMIT_CALL(bpf_reset_exception),
> > +				BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
> > +				insn[i],
> > +			};
>
> Is not correct in global bpf progs that take more than 1 argument.
>

But this is not done for global subprogs, only for the main subprog, it only
needs to be done when we enter the program from the kernel.

> How about using a scratch space in prog->aux->exception[] instead of current task?
>

I actually had this thought. It's even better because we can hardcode the
address of the exception state right in the program (since prog->aux remains
stable during bpf_patch_insn_data). However, concurrent program invocations on
multiple CPUs doesn't work well with this. It's like, one program sets the state
while the other tries to check it. It can be per-CPU but then we have to disable
preemption (which cannot be done).

Unfortunately per-task state seemed like the only choice which works without
complicating things too much.

> > +notrace u64 bpf_get_exception(void)
> > +{
> > +	int i = interrupt_context_level();
> > +
> > +	return current->bpf_exception_thrown[i];
> > +}
>
> this is too slow to be acceptable.

I agree, also partly why I still marked this an RFC.

> it needs to be single load plus branch.
> with prog->aux->exception approach we can achieve that.
> Instead of inserting a call to bpf_get_exception() we can do load+cmp.
> We probably should pass prog->aux into exception callback, so it
> can know where throw came from.
>

IMO prog->aux->exception won't work either (unless I'm missing some way which
you can point out). The other option would be that we spill pointer to the
per-task exception state to the program's stack on entry, and then every check
loads the value and performs the check. It will be a load from stack, load from
memory and then a jump instruction. Still not as good as a direct load though,
which we'd have with prog->aux, but much better than the current state.

> > - Rewrites happen for bpf_throw and call instructions to subprogs.
> >   The instructions which are executed in the main frame of the main
> >   program (thus, not global functions and extension programs, which end
> >   up executing in frame > 0) need to be rewritten differently. This is
> >   tracked using BPF_THROW_OUTER vs BPF_THROW_INNER. If not done, a
>
> how about BPF_THROW_OUTER vs BPF_THROW_ANY_INNER ?
> would it be more precise ?

I'm fine with the renaming. The only thing the type signifies is if we need to
do the rewrite for frame 0 vs frame N.

>
> > +__bpf_kfunc notrace void bpf_throw(void)
> > +{
> > +	int i = interrupt_context_level();
> > +
> > +	current->bpf_exception_thrown[i] = true;
> > +}
>
> I think this needs to take u64 or couple u64 args and store them
> in the scratch area.
> bpf_assert* macros also need a way for bpf prog to specify
> the reason for the assertion.
> Otherwise there won't be any way to debug what happened.

I agree. Should we force it to be a constant value? Then we can hardcode it in
the .text without having to save and restore it, but that might end up being a
little too restrictive?