On Fri, Apr 07, 2023 at 04:46:55AM +0200, Kumar Kartikeya Dwivedi wrote: > On Fri, Apr 07, 2023 at 04:11:36AM CEST, Alexei Starovoitov wrote: > > On Fri, Apr 07, 2023 at 01:54:03AM +0200, Kumar Kartikeya Dwivedi wrote: > > > On Thu, Apr 06, 2023 at 04:16:22AM CEST, Alexei Starovoitov wrote: > > > > On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > > > > > > > - The exception state is represented using four booleans in the > > > > > task_struct of current task. Each boolean corresponds to the exception > > > > > state for each kernel context. This allows BPF programs to be > > > > > interrupted and still not clobber the other's exception state. > > > > > > > > that doesn't work for sleepable bpf progs and in RT for regular progs too. > > > > > > > > > > Can you elaborate? If a sleepable program blocks, that means the task is > > > scheduled out, so the next program will use the other task's task_struct. > > > Same for preemption for normal progs (under RT or not). > > > > I was worried about the case of the same task but different code paths > > in the kernel with tracing prog stepping on preempted lsm.s prog. > > I think you point that in this case they gotta be in different interrupt_context_level. > > I need to think it through a bit more. > > > > If there is nesting, the programs always clear their exception state on exit, so > the prog that calls into the kernel which then calls into the tracing prog etc. > won't see its exception state on return. The only path where attaching programs > would screw things up is when we see a thrown exception and start unwinding > (where clearing would be a problem since its done frame-by-frame). For that, I > already prevent _throwing_ fexit programs from attaching to subprogs in this > series (normal ones are still ok and supported, because fentry/fexit is > important for stats etc.). There might be some other corner cases I missed but > ensuring this property alone in general should make things work correctly. > > > > > How about using a scratch space in prog->aux->exception[] instead of current task? > > > > > > > > > > I actually had this thought. It's even better because we can hardcode the > > > address of the exception state right in the program (since prog->aux remains > > > stable during bpf_patch_insn_data). > > > > exactly. > > > > > However, concurrent program invocations on > > > multiple CPUs doesn't work well with this. It's like, one program sets the state > > > while the other tries to check it. > > > > Right. If it asserts on one cpu all other cpus will unwind as well, > > since we're saying bpf_assert is for exceptions when user cannot convince > > the verifier that the program is correct. > > So it doesn't matter that it aborted everywhere. It's probably a good thing too. > > > > We can discuss the semantics (this makes bpf_assert more stronger and basically > poisons the program globally in some sense), but implementation wise it's going > to be a lot more tricky to reason about correctness. > > Right now, the verifier follows paths and knows what resources are held when we > throw from a nested call chain (to complain on leaks). Callers doing the check > for exception state at runtime expect only certain throwing points to trigger > the check and rely on that for leak freedom. > > With a global prog->aux->exception, things will be ok for the CPU on which the > exception was thrown, but some other CPU will see the check returning true in a > caller even if the callee subprog for it did not throw and was possibly > transferring its acquired references to the caller after completing execution, > which now causes leaks (because subprogs are allowed to acquire and return to > their caller). > > The way to handle this would be that we assume every callee which throws may > also notionally throw right when returning (due to some other CPU's throw which > we may see). Then every exit from throwing callees may be processed as throwing > if we see the global state as set. > > However, this completely prevents subprogs from transferring some acquired > resource to their caller (which I think is too restrictive). If I'm acquiring > memory from static subprog and returning to my caller, I can't any longer since > I notionally throw when exiting and holding resources when doing bpf_throw is > disallowed, so transfer is out of the question. I was under impression that subprogs cannot acquire refs and transfer them to caller. Looks like your commit 9d9d00ac29d0 ("bpf: Fix reference state management for synchronous callbacks") allowed too much. I don't think it's a good idea to support coding patterns like: void my_alloc_foo(struct foo **ptr) { struct foo *p = bpf_obj_new(typeof(*p)); *ptr = p; } It's a correct C, of course, but do we really want to support such code? I don't think the verifier can fully support it anyway. That commit of yours allowed some of it in theory, but above example probably won't work, since 'transfer' isn't understood by the verifier. Regardless whether we tighten the verifier now or later such subprogs shouldn't be throwing. So I don't see an issue doing global prog->aux->exception.