On Fri, Apr 07, 2023 at 01:54:03AM +0200, Kumar Kartikeya Dwivedi wrote: > On Thu, Apr 06, 2023 at 04:16:22AM CEST, Alexei Starovoitov wrote: > > On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > > > - The exception state is represented using four booleans in the > > > task_struct of current task. Each boolean corresponds to the exception > > > state for each kernel context. This allows BPF programs to be > > > interrupted and still not clobber the other's exception state. > > > > that doesn't work for sleepable bpf progs and in RT for regular progs too. > > > > Can you elaborate? If a sleepable program blocks, that means the task is > scheduled out, so the next program will use the other task's task_struct. > Same for preemption for normal progs (under RT or not). I was worried about the case of the same task but different code paths in the kernel with tracing prog stepping on preempted lsm.s prog. I think you point that in this case they gotta be in different interrupt_context_level. I need to think it through a bit more. > > How about using a scratch space in prog->aux->exception[] instead of current task? > > > > I actually had this thought. It's even better because we can hardcode the > address of the exception state right in the program (since prog->aux remains > stable during bpf_patch_insn_data). exactly. > However, concurrent program invocations on > multiple CPUs doesn't work well with this. It's like, one program sets the state > while the other tries to check it. Right. If it asserts on one cpu all other cpus will unwind as well, since we're saying bpf_assert is for exceptions when user cannot convince the verifier that the program is correct. So it doesn't matter that it aborted everywhere. It's probably a good thing too. > It can be per-CPU but then we have to disable > preemption (which cannot be done). I was thinking to propose a per-cpu prog->aux->exception[] area. Sleepable and not are in migrate_disable(). bpf progs never migrate. So we can do it, but we'd need new pseudo this_cpu_ptr instruction and corresponding JIT support which felt like overkill. Another idea I contemplated is to do preempt_disable() and local_irq_save() into special field in prog->aux->exception first thing in bpf_throw() and then re-enable everything before entering exception cb. To avoid races with other progs, but NMI can still happen, so it's pointless. Just non-per-cpu prog->aux->exception seems good enough. > > > - Rewrites happen for bpf_throw and call instructions to subprogs. > > > The instructions which are executed in the main frame of the main > > > program (thus, not global functions and extension programs, which end > > > up executing in frame > 0) need to be rewritten differently. This is > > > tracked using BPF_THROW_OUTER vs BPF_THROW_INNER. If not done, a > > > > how about BPF_THROW_OUTER vs BPF_THROW_ANY_INNER ? > > would it be more precise ? > > I'm fine with the renaming. The only thing the type signifies is if we need to > do the rewrite for frame 0 vs frame N. Sure. BPF_THROW_FRAME_ZERO and BPF_THROW_FRAME_NON_ZERO also works. Or any name that shows that 2nd includes multiple cases. > > > > > +__bpf_kfunc notrace void bpf_throw(void) > > > +{ > > > + int i = interrupt_context_level(); > > > + > > > + current->bpf_exception_thrown[i] = true; > > > +} > > > > I think this needs to take u64 or couple u64 args and store them > > in the scratch area. > > bpf_assert* macros also need a way for bpf prog to specify > > the reason for the assertion. > > Otherwise there won't be any way to debug what happened. > > I agree. Should we force it to be a constant value? Then we can hardcode it in > the .text without having to save and restore it, but that might end up being a > little too restrictive? with prog->aux->exception approach run-time values works too.