On Thu, Apr 06, 2023 at 04:16:22AM CEST, Alexei Starovoitov wrote: > On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > - The exception state is represented using four booleans in the > > task_struct of current task. Each boolean corresponds to the exception > > state for each kernel context. This allows BPF programs to be > > interrupted and still not clobber the other's exception state. > > that doesn't work for sleepable bpf progs and in RT for regular progs too. > Can you elaborate? If a sleepable program blocks, that means the task is scheduled out, so the next program will use the other task's task_struct. Same for preemption for normal progs (under RT or not). Is there something else that I'm missing? > > - The other vexing case is of recursion. If a program calls into another > > program (e.g. call into helper which invokes tracing program > > eventually), it may throw and clobber the current exception state. To > > avoid this, an invariant is maintained across the implementation: > > Exception state is always cleared on entry and exit of the main > > BPF program. > > This implies that if recursion occurs, the BPF program will clear the > > current exception state on entry and exit. However, callbacks do not > > do the same, because they are subprograms. The case for propagating > > exceptions of callbacks invoked by the kernel back to the BPF program > > is handled in the next commit. This is also the main reason to clear > > exception state on entry, asynchronous callbacks can clobber exception > > state even though we make sure it's always set to be 0 within the > > kernel. > > Anyhow, the only other thing to be kept in mind is to never allow a > > BPF program to execute when the program is being unwinded. This > > implies that every function involved in this path must be notrace, > > which is the case for bpf_throw, bpf_get_exception and > > bpf_reset_exception. > > ... > > > + struct bpf_insn entry_insns[] = { > > + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), > > + BPF_EMIT_CALL(bpf_reset_exception), > > + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), > > + insn[i], > > + }; > > Is not correct in global bpf progs that take more than 1 argument. > But this is not done for global subprogs, only for the main subprog, it only needs to be done when we enter the program from the kernel. > How about using a scratch space in prog->aux->exception[] instead of current task? > I actually had this thought. It's even better because we can hardcode the address of the exception state right in the program (since prog->aux remains stable during bpf_patch_insn_data). However, concurrent program invocations on multiple CPUs doesn't work well with this. It's like, one program sets the state while the other tries to check it. It can be per-CPU but then we have to disable preemption (which cannot be done). Unfortunately per-task state seemed like the only choice which works without complicating things too much. > > +notrace u64 bpf_get_exception(void) > > +{ > > + int i = interrupt_context_level(); > > + > > + return current->bpf_exception_thrown[i]; > > +} > > this is too slow to be acceptable. I agree, also partly why I still marked this an RFC. > it needs to be single load plus branch. > with prog->aux->exception approach we can achieve that. > Instead of inserting a call to bpf_get_exception() we can do load+cmp. > We probably should pass prog->aux into exception callback, so it > can know where throw came from. > IMO prog->aux->exception won't work either (unless I'm missing some way which you can point out). The other option would be that we spill pointer to the per-task exception state to the program's stack on entry, and then every check loads the value and performs the check. It will be a load from stack, load from memory and then a jump instruction. Still not as good as a direct load though, which we'd have with prog->aux, but much better than the current state. > > - Rewrites happen for bpf_throw and call instructions to subprogs. > > The instructions which are executed in the main frame of the main > > program (thus, not global functions and extension programs, which end > > up executing in frame > 0) need to be rewritten differently. This is > > tracked using BPF_THROW_OUTER vs BPF_THROW_INNER. If not done, a > > how about BPF_THROW_OUTER vs BPF_THROW_ANY_INNER ? > would it be more precise ? I'm fine with the renaming. The only thing the type signifies is if we need to do the rewrite for frame 0 vs frame N. > > > +__bpf_kfunc notrace void bpf_throw(void) > > +{ > > + int i = interrupt_context_level(); > > + > > + current->bpf_exception_thrown[i] = true; > > +} > > I think this needs to take u64 or couple u64 args and store them > in the scratch area. > bpf_assert* macros also need a way for bpf prog to specify > the reason for the assertion. > Otherwise there won't be any way to debug what happened. I agree. Should we force it to be a constant value? Then we can hardcode it in the .text without having to save and restore it, but that might end up being a little too restrictive?