On Wed, Apr 12, 2023 at 09:36:12PM CEST, Alexei Starovoitov wrote: > On Fri, Apr 07, 2023 at 04:46:55AM +0200, Kumar Kartikeya Dwivedi wrote: > > On Fri, Apr 07, 2023 at 04:11:36AM CEST, Alexei Starovoitov wrote: > > > On Fri, Apr 07, 2023 at 01:54:03AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > On Thu, Apr 06, 2023 at 04:16:22AM CEST, Alexei Starovoitov wrote: > > > > > On Wed, Apr 05, 2023 at 02:42:33AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > > > > > > > > > - The exception state is represented using four booleans in the > > > > > > task_struct of current task. Each boolean corresponds to the exception > > > > > > state for each kernel context. This allows BPF programs to be > > > > > > interrupted and still not clobber the other's exception state. > > > > > > > > > > that doesn't work for sleepable bpf progs and in RT for regular progs too. > > > > > > > > > > > > > Can you elaborate? If a sleepable program blocks, that means the task is > > > > scheduled out, so the next program will use the other task's task_struct. > > > > Same for preemption for normal progs (under RT or not). > > > > > > I was worried about the case of the same task but different code paths > > > in the kernel with tracing prog stepping on preempted lsm.s prog. > > > I think you point that in this case they gotta be in different interrupt_context_level. > > > I need to think it through a bit more. > > > > > > > If there is nesting, the programs always clear their exception state on exit, so > > the prog that calls into the kernel which then calls into the tracing prog etc. > > won't see its exception state on return. The only path where attaching programs > > would screw things up is when we see a thrown exception and start unwinding > > (where clearing would be a problem since its done frame-by-frame). For that, I > > already prevent _throwing_ fexit programs from attaching to subprogs in this > > series (normal ones are still ok and supported, because fentry/fexit is > > important for stats etc.). There might be some other corner cases I missed but > > ensuring this property alone in general should make things work correctly. > > > > > > > How about using a scratch space in prog->aux->exception[] instead of current task? > > > > > > > > > > > > > I actually had this thought. It's even better because we can hardcode the > > > > address of the exception state right in the program (since prog->aux remains > > > > stable during bpf_patch_insn_data). > > > > > > exactly. > > > > > > > However, concurrent program invocations on > > > > multiple CPUs doesn't work well with this. It's like, one program sets the state > > > > while the other tries to check it. > > > > > > Right. If it asserts on one cpu all other cpus will unwind as well, > > > since we're saying bpf_assert is for exceptions when user cannot convince > > > the verifier that the program is correct. > > > So it doesn't matter that it aborted everywhere. It's probably a good thing too. > > > > > > > We can discuss the semantics (this makes bpf_assert more stronger and basically > > poisons the program globally in some sense), but implementation wise it's going > > to be a lot more tricky to reason about correctness. > > > > Right now, the verifier follows paths and knows what resources are held when we > > throw from a nested call chain (to complain on leaks). Callers doing the check > > for exception state at runtime expect only certain throwing points to trigger > > the check and rely on that for leak freedom. > > > > With a global prog->aux->exception, things will be ok for the CPU on which the > > exception was thrown, but some other CPU will see the check returning true in a > > caller even if the callee subprog for it did not throw and was possibly > > transferring its acquired references to the caller after completing execution, > > which now causes leaks (because subprogs are allowed to acquire and return to > > their caller). > > > > The way to handle this would be that we assume every callee which throws may > > also notionally throw right when returning (due to some other CPU's throw which > > we may see). Then every exit from throwing callees may be processed as throwing > > if we see the global state as set. > > > > However, this completely prevents subprogs from transferring some acquired > > resource to their caller (which I think is too restrictive). If I'm acquiring > > memory from static subprog and returning to my caller, I can't any longer since > > I notionally throw when exiting and holding resources when doing bpf_throw is > > disallowed, so transfer is out of the question. > > I was under impression that subprogs cannot acquire refs and transfer them > to caller. > Looks like your commit 9d9d00ac29d0 ("bpf: Fix reference state management for synchronous callbacks") > allowed too much. I think you misunderstood the change in that commit. It was about restricting callback functions from acquiring references and not releasing them before their BPF_EXIT (since our handling is not completely correct for more than one iteration). The verifier has always allowed acquiring references and transferring them to the caller for subprog calls. > I don't think it's a good idea to support coding patterns like: > void my_alloc_foo(struct foo **ptr) > { > struct foo *p = bpf_obj_new(typeof(*p)); > *ptr = p; > } > > It's a correct C, of course, but do we really want to support such code? > I don't think the verifier can fully support it anyway. > That commit of yours allowed some of it in theory, but above example probably won't work, > since 'transfer' isn't understood by the verifier. I have no strong opinions about restricting (I think the code for handling transfers is sane and correct, we just transfer the modified reference state, and it's a natural valid form of writing programs), especially since static subprogs do not have the limitations as global subprogs, and you really don't want to inline everything all the time. But I think we may end up breaking existing code/programs if we do. Whether that fallout will be small or not, I have no data yet to predict. > > Regardless whether we tighten the verifier now or later such subprogs shouldn't be throwing. > So I don't see an issue doing global prog->aux->exception. That's certainly an option, but I still think we need to be a bit careful. The reason is that during analysis, we need to determine that whenever a subprog call exits, are we in a state where we can safely unwind? It might end up restricting a large set of use cases, but I can only say with certainty after I try it out. Right now, I heavily rely on the assumption that the checks only become true when something throws (to also minimize rewrites, but that's a minor reason). The core reason is being able to argue about correctness. With global exception state, they can become true anytime, so we need to be a lot more conservative even if we e.g. didn't see a subprog as throwing from all callsites. call subprog(A) // will be rewritten, as using R1=A can throw call subprog(B) // not rewritten, as using R1=B does not throw