Re: [PATCH RFC bpf-next v1 3/9] bpf: Implement bpf_throw kfunc

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Thu, 6 Apr 2023 19:11:36 -0700

On Fri, Apr 07, 2023 at 01:54:03AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Thu, Apr 06, 2023 at 04:16:22AM CEST, Alexei Starovoitov wrote:
> > On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote:
> > >
> > > - The exception state is represented using four booleans in the
> > >   task_struct of current task. Each boolean corresponds to the exception
> > >   state for each kernel context. This allows BPF programs to be
> > >   interrupted and still not clobber the other's exception state.
> >
> > that doesn't work for sleepable bpf progs and in RT for regular progs too.
> >
> 
> Can you elaborate? If a sleepable program blocks, that means the task is
> scheduled out, so the next program will use the other task's task_struct.
> Same for preemption for normal progs (under RT or not).

I was worried about the case of the same task but different code paths
in the kernel with tracing prog stepping on preempted lsm.s prog.
I think you point that in this case they gotta be in different interrupt_context_level.
I need to think it through a bit more.

> > How about using a scratch space in prog->aux->exception[] instead of current task?
> >
> 
> I actually had this thought. It's even better because we can hardcode the
> address of the exception state right in the program (since prog->aux remains
> stable during bpf_patch_insn_data). 

exactly.

> However, concurrent program invocations on
> multiple CPUs doesn't work well with this. It's like, one program sets the state
> while the other tries to check it. 

Right. If it asserts on one cpu all other cpus will unwind as well,
since we're saying bpf_assert is for exceptions when user cannot convince
the verifier that the program is correct.
So it doesn't matter that it aborted everywhere. It's probably a good thing too.

> It can be per-CPU but then we have to disable
> preemption (which cannot be done).

I was thinking to propose a per-cpu prog->aux->exception[] area.
Sleepable and not are in migrate_disable(). bpf progs never migrate.
So we can do it, but we'd need new pseudo this_cpu_ptr instruction and
corresponding JIT support which felt like overkill.

Another idea I contemplated is to do preempt_disable() and local_irq_save()
into special field in prog->aux->exception first thing in bpf_throw()
and then re-enable everything before entering exception cb.
To avoid races with other progs, but NMI can still happen, so it's pointless.
Just non-per-cpu prog->aux->exception seems good enough.

> > > - Rewrites happen for bpf_throw and call instructions to subprogs.
> > >   The instructions which are executed in the main frame of the main
> > >   program (thus, not global functions and extension programs, which end
> > >   up executing in frame > 0) need to be rewritten differently. This is
> > >   tracked using BPF_THROW_OUTER vs BPF_THROW_INNER. If not done, a
> >
> > how about BPF_THROW_OUTER vs BPF_THROW_ANY_INNER ?
> > would it be more precise ?
> 
> I'm fine with the renaming. The only thing the type signifies is if we need to
> do the rewrite for frame 0 vs frame N.

Sure. BPF_THROW_FRAME_ZERO and BPF_THROW_FRAME_NON_ZERO also works.
Or any name that shows that 2nd includes multiple cases.

> >
> > > +__bpf_kfunc notrace void bpf_throw(void)
> > > +{
> > > +	int i = interrupt_context_level();
> > > +
> > > +	current->bpf_exception_thrown[i] = true;
> > > +}
> >
> > I think this needs to take u64 or couple u64 args and store them
> > in the scratch area.
> > bpf_assert* macros also need a way for bpf prog to specify
> > the reason for the assertion.
> > Otherwise there won't be any way to debug what happened.
> 
> I agree. Should we force it to be a constant value? Then we can hardcode it in
> the .text without having to save and restore it, but that might end up being a
> little too restrictive?

with prog->aux->exception approach run-time values works too.