On Tue, 23 Jul 2024 at 01:47, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Thu, Jul 11, 2024 at 12:10:17PM +0100, Juntong Deng wrote: > > > > In restore_udp_socket I had to add a struct bpf_crib_skb_info for > > restoring packets, this is because there is currently no BPF_CORE_WRITE. > > > > I am not sure what the current attitude of the kernel community > > towards BPF_CORE_WRITE is, personally I think it is well worth adding, > > as we need a portable way to change the value in the kernel. > > > > This not only allows more complexity in the CRIB restoring part to > > be transferred from CRIB kfuncs to CRIB ebpf programs, but also allows > > ebpf to unlock more possible application scenarios. > > There are lots of interesting ideas in this patch set, but it seems they are > doing the 'C-checkpoint' part of CRIx and something like BPF_CORE_WRITE > is necessary for 'R-restore'. > I'm afraid BPF_CORE_WRITE cannot be introduced without breaking all safety nets. > It will make bpf just as unsafe as any kernel module if bpf progs can start > writing into arbitrary kernel data structures. So it's a show stopper. > If you think there is a value in adding all these iterators for 'checkpoint' > part alone we can discuss and generalize individual patches. I think it would be better to focus on the particular problem Juntong wants to solve, and go from there. That might help in cutting down the size of the patch set. It seems the main problem was restoring UDP sockets, but it got lost among all the other stuff. It's better to begin the discussion from there, which can still be rooted in what you believe CRIB in general is useful for. Also, information is missing on what the previous attempts at solving this UDP problem were, and why they were insufficient such that BPF was necessary. What motivates the examples included as part of this set? I think this particular GSoC project is not new, so what were the limitations in previous attempts at restoring UDP sockets? Adding kfuncs makes it easier to checkpoint and restore state, but it also carries a maintenance cost. Using BPF to speed up task state dump is going to be beneficial, but is an orthogonal problem (and doesn't have to be CRIU specific, the primitives that CRIU requires can be generic and used by others as well). You're also skirting all kinds of compatibility concerns if you encode state to restore into structs, not getting into specifics, but if this pattern is followed, what happens on a kernel where say a particular field isn't available? It is a possibility that kfuncs may change their behavior due to kernel changes (not CRIB changes particularly), so how does user space respond to that? Sometimes, patches are backported, how does feature detection work? What happens when the struct used to restore is grown to accomodate more state to restore? Kfuncs will have to detect the size of the structure and work with multiple versions (like what nf_conntrack_bpf kfuncs try to do with opts__sz). I tried to add io_uring and epoll iterators for capturing state (https://lore.kernel.org/bpf/20211201042333.2035153-1-memxor@xxxxxxxxx) a couple of years back, although I didn't have time to pursue it further after GSoC. But I tried to minimize the restoration interfaces exposed precisely because the above is hard to ensure. The more kfuncs you expose to restore state, the deeper the hole becomes, since it's meant to be a relatively user-friendly interface for CRIU to use, and work across different kernel versions. Can the values passed through the struct to restore state be trusted? I'm not very well versed with the net/, but I think bpf_restore_skb_rcv_queue isn't doing much sanitization and taking in whatever was passed by the program. It would be helpful to explain why that is or is not ok. It's easier to review if we just focus on a particular problem. I think let's start with the UDP case, and then look at everything else later. > > High level feedback: > > - no need for BPF_PROG_TYPE_CRIB program type. Existing syscall type should fit. > +1 > - proposed file/socket iterators are somewhat unnecessary in this open coded form. > there is already file/socket iterator. From the selftests it looks like it > can be used to do 'checkpoint' part already. +1 > > - KF_ITER_GETTER is a good addition, but we should be able to do it without these flags. > kfunc-s should be able to accept iterator as an argument. Some __suffix annotation > may be necessary to help verifier if BTF type alone of the argument won't be enough. > > - KF_OBTAIN looks like a broken hammer to bypass safety. Like: > > > Currently we cannot pass the pointer returned by the iterator next > > method as argument to the KF_TRUSTED_ARGS kfuncs, because the pointer > > returned by the iterator next method is not "valid". I've replied to this particular patch to explain what exact unsafety it might introduce. I also think the 2nd use case might be fixed by a recent patch. [...]