Re: [RFC PATCH bpf-next RESEND 00/16] bpf: Checkpoint/Restore In eBPF (CRIB)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/23/24 01:49, Kumar Kartikeya Dwivedi wrote:
On Tue, 23 Jul 2024 at 01:47, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:

On Thu, Jul 11, 2024 at 12:10:17PM +0100, Juntong Deng wrote:

In restore_udp_socket I had to add a struct bpf_crib_skb_info for
restoring packets, this is because there is currently no BPF_CORE_WRITE.

I am not sure what the current attitude of the kernel community
towards BPF_CORE_WRITE is, personally I think it is well worth adding,
as we need a portable way to change the value in the kernel.

This not only allows more complexity in the CRIB restoring part to
be transferred from CRIB kfuncs to CRIB ebpf programs, but also allows
ebpf to unlock more possible application scenarios.

There are lots of interesting ideas in this patch set, but it seems they are
doing the 'C-checkpoint' part of CRIx and something like BPF_CORE_WRITE
is necessary for 'R-restore'.
I'm afraid BPF_CORE_WRITE cannot be introduced without breaking all safety nets.
It will make bpf just as unsafe as any kernel module if bpf progs can start
writing into arbitrary kernel data structures. So it's a show stopper.
If you think there is a value in adding all these iterators for 'checkpoint'
part alone we can discuss and generalize individual patches.

I think it would be better to focus on the particular problem Juntong
wants to solve, and go from there.
That might help in cutting down the size of the patch set.
It seems the main problem was restoring UDP sockets, but it got lost
among all the other stuff.
It's better to begin the discussion from there, which can still be
rooted in what you believe CRIB in general is useful for.

Also, information is missing on what the previous attempts at solving
this UDP problem were, and why they were insufficient such that BPF
was necessary.
What motivates the examples included as part of this set?
I think this particular GSoC project is not new, so what were the
limitations in previous attempts at restoring UDP sockets?

Yes, this idea originated from the GSoC task of dumping CORK-ed
UDP socket.

While I was solving this task I realized that ebpf has a much greater
potential to completely change the way we checkpoint/restore processes,
and can achieve better performance and more extensibility,
and that is CRIB.

For now, restoring CORK-ed UDP sockets is just one of the problems that
CRIB can solve, and it is not the main problem (that is, CRIB is not
designed around solving UDP problem).

(The difficulty with restoring CORK-ed UDP is that we do not have a
simple and elegant way to read back UDP packets in the write queue
before, but this is a simple task in CRIB.)

This is why I did not mention the GSoC task and the previous attempts
to solve the UDP problem in the patch set, because it is not the
same problem, and the previous solution to the UDP problem has nothing
to do with ebpf (CRIB).

But if adding this information would be useful, I can add it in the next
version of the patch set.

Adding kfuncs makes it easier to checkpoint and restore state, but it
also carries a maintenance cost.

Using BPF to speed up task state dump is going to be beneficial, but
is an orthogonal problem (and doesn't have to be CRIU specific, the
primitives that CRIU requires can be generic and used by others as
well).

You're also skirting all kinds of compatibility concerns if you encode
state to restore into structs, not getting into specifics, but if this
pattern is followed, what happens on a kernel where say a particular
field isn't available? It is a possibility that kfuncs may change
their behavior due to kernel changes (not CRIB changes particularly),
so how does user space respond to that? Sometimes, patches are
backported, how does feature detection work?

What happens when the struct used to restore is grown to accomodate
more state to restore? Kfuncs will have to detect the size of the
structure and work with multiple versions (like what nf_conntrack_bpf
kfuncs try to do with opts__sz).


You are right, so CRIB needs BPF_CORE_READ and BPF_CORE_WRITE because we
need a portable way to read/write kernel structure values, and achieving
portability only through kfuncs would be a complex tough problem.

But since BPF_CORE_WRITE cannot be introduced, we put the restoration
part on hold and focus on dumping part first, which we can achieve
portability with BPF_CORE_READ.

I tried to add io_uring and epoll iterators for capturing state
(https://lore.kernel.org/bpf/20211201042333.2035153-1-memxor@xxxxxxxxx)
a couple of years back, although I didn't have time to pursue it
further after GSoC. But I tried to minimize the restoration interfaces
exposed precisely because the above is hard to ensure. The more kfuncs
you expose to restore state, the deeper the hole becomes, since it's
meant to be a relatively user-friendly interface for CRIU to use, and
work across different kernel versions.


This was a great attempt!

I have always believed that checkpoint/restore via ebpf has great
potential (that is why I created CRIB).

If CRIB is successfully merged to the mainline, maybe we can retry
dumping io_uring and epoll via ebpf.

Can the values passed through the struct to restore state be trusted?
I'm not very well versed with the net/, but I think
bpf_restore_skb_rcv_queue isn't doing much sanitization and taking in
whatever was passed by the program. It would be helpful to explain why
that is or is not ok.


Of course it cannot be trusted, but since this is an RFC patch set,
I did not put too much effort into security checking (sanitization),
I mainly wanted to show the idea (proof of concept) and get feedback.

I will put more effort into security in subsequent versions of
the patch set.

It's easier to review if we just focus on a particular problem. I
think let's start with the UDP case, and then look at everything else
later.


Yes, it is always good to start with a particular problem.

I will focus next on solving the socket dump problem via CRIB and try to
integrate it into the CRIU project (in a personal branch).

If the above patch set is not too large, maybe I can also try to solve
one or two problems via CRIB that cannot be well dumped via procfs
in CRIU (poor performance or incomplete information).

Anyway, I will keep the next version of the patch set small and easy
to review.


High level feedback:

- no need for BPF_PROG_TYPE_CRIB program type. Existing syscall type should fit.


+1

- proposed file/socket iterators are somewhat unnecessary in this open coded form.
   there is already file/socket iterator. From the selftests it looks like it
   can be used to do 'checkpoint' part already.

+1


- KF_ITER_GETTER is a good addition, but we should be able to do it without these flags.
   kfunc-s should be able to accept iterator as an argument. Some __suffix annotation
   may be necessary to help verifier if BTF type alone of the argument won't be enough.

- KF_OBTAIN looks like a broken hammer to bypass safety. Like:

   > Currently we cannot pass the pointer returned by the iterator next
   > method as argument to the KF_TRUSTED_ARGS kfuncs, because the pointer
   > returned by the iterator next method is not "valid".

I've replied to this particular patch to explain what exact unsafety
it might introduce.
I also think the 2nd use case might be fixed by a recent patch.

[...]





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux