Re: [RFC PATCH bpf-next RESEND 00/16] bpf: Checkpoint/Restore In eBPF (CRIB)

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Tue, 23 Jul 2024 02:49:38 +0200

On Tue, 23 Jul 2024 at 01:47, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Jul 11, 2024 at 12:10:17PM +0100, Juntong Deng wrote:
> >
> > In restore_udp_socket I had to add a struct bpf_crib_skb_info for
> > restoring packets, this is because there is currently no BPF_CORE_WRITE.
> >
> > I am not sure what the current attitude of the kernel community
> > towards BPF_CORE_WRITE is, personally I think it is well worth adding,
> > as we need a portable way to change the value in the kernel.
> >
> > This not only allows more complexity in the CRIB restoring part to
> > be transferred from CRIB kfuncs to CRIB ebpf programs, but also allows
> > ebpf to unlock more possible application scenarios.
>
> There are lots of interesting ideas in this patch set, but it seems they are
> doing the 'C-checkpoint' part of CRIx and something like BPF_CORE_WRITE
> is necessary for 'R-restore'.
> I'm afraid BPF_CORE_WRITE cannot be introduced without breaking all safety nets.
> It will make bpf just as unsafe as any kernel module if bpf progs can start
> writing into arbitrary kernel data structures. So it's a show stopper.
> If you think there is a value in adding all these iterators for 'checkpoint'
> part alone we can discuss and generalize individual patches.

I think it would be better to focus on the particular problem Juntong
wants to solve, and go from there.
That might help in cutting down the size of the patch set.
It seems the main problem was restoring UDP sockets, but it got lost
among all the other stuff.
It's better to begin the discussion from there, which can still be
rooted in what you believe CRIB in general is useful for.

Also, information is missing on what the previous attempts at solving
this UDP problem were, and why they were insufficient such that BPF
was necessary.
What motivates the examples included as part of this set?
I think this particular GSoC project is not new, so what were the
limitations in previous attempts at restoring UDP sockets?
Adding kfuncs makes it easier to checkpoint and restore state, but it
also carries a maintenance cost.

Using BPF to speed up task state dump is going to be beneficial, but
is an orthogonal problem (and doesn't have to be CRIU specific, the
primitives that CRIU requires can be generic and used by others as
well).

You're also skirting all kinds of compatibility concerns if you encode
state to restore into structs, not getting into specifics, but if this
pattern is followed, what happens on a kernel where say a particular
field isn't available? It is a possibility that kfuncs may change
their behavior due to kernel changes (not CRIB changes particularly),
so how does user space respond to that? Sometimes, patches are
backported, how does feature detection work?

What happens when the struct used to restore is grown to accomodate
more state to restore? Kfuncs will have to detect the size of the
structure and work with multiple versions (like what nf_conntrack_bpf
kfuncs try to do with opts__sz).

I tried to add io_uring and epoll iterators for capturing state
(https://lore.kernel.org/bpf/20211201042333.2035153-1-memxor@xxxxxxxxx)
a couple of years back, although I didn't have time to pursue it
further after GSoC. But I tried to minimize the restoration interfaces
exposed precisely because the above is hard to ensure. The more kfuncs
you expose to restore state, the deeper the hole becomes, since it's
meant to be a relatively user-friendly interface for CRIU to use, and
work across different kernel versions.

Can the values passed through the struct to restore state be trusted?
I'm not very well versed with the net/, but I think
bpf_restore_skb_rcv_queue isn't doing much sanitization and taking in
whatever was passed by the program. It would be helpful to explain why
that is or is not ok.

It's easier to review if we just focus on a particular problem. I
think let's start with the UDP case, and then look at everything else
later.

>
> High level feedback:
>
> - no need for BPF_PROG_TYPE_CRIB program type. Existing syscall type should fit.
>

+1

> - proposed file/socket iterators are somewhat unnecessary in this open coded form.
>   there is already file/socket iterator. From the selftests it looks like it
>   can be used to do 'checkpoint' part already.

+1

>
> - KF_ITER_GETTER is a good addition, but we should be able to do it without these flags.
>   kfunc-s should be able to accept iterator as an argument. Some __suffix annotation
>   may be necessary to help verifier if BTF type alone of the argument won't be enough.
>
> - KF_OBTAIN looks like a broken hammer to bypass safety. Like:
>
>   > Currently we cannot pass the pointer returned by the iterator next
>   > method as argument to the KF_TRUSTED_ARGS kfuncs, because the pointer
>   > returned by the iterator next method is not "valid".

I've replied to this particular patch to explain what exact unsafety
it might introduce.
I also think the 2nd use case might be fixed by a recent patch.

[...]