Re: [RFC PATCH net-next 0/5] net: In-kernel QUIC implementation with Userspace handshake

Martin KaFai Lau <martin.lau@xxxxxxxxx> · Thu, 25 Apr 2024 21:58:50 -0700

On 4/22/24 1:58 PM, Xin Long wrote:
On Sun, Apr 21, 2024 at 3:27 PM Stefan Metzmacher <metze@xxxxxxxxx> wrote:

Am 20.04.24 um 21:32 schrieb Xin Long:
On Fri, Apr 19, 2024 at 3:19 PM Xin Long <lucien.xin@xxxxxxxxx> wrote:

On Fri, Apr 19, 2024 at 2:51 PM Stefan Metzmacher <metze@xxxxxxxxx> wrote:

Hi Xin Long,

But I think its unavoidable for the ALPN and SNI fields on
the server side. As every service tries to use udp port 443
and somehow that needs to be shared if multiple services want to
use it.

I guess on the acceptor side we would need to somehow detach low level
udp struct sock from the logical listen struct sock.

And quic_do_listen_rcv() would need to find the correct logical listening
socket and call quic_request_sock_enqueue() on the logical socket
not the lowlevel udo socket. The same for all stuff happening after
quic_request_sock_enqueue() at the end of quic_do_listen_rcv.

The implementation allows one low level UDP sock to serve for multiple
QUIC socks.

Currently, if your 3 quic applications listen to the same address:port
with SO_REUSEPORT socket option set, the incoming connection will choose
one of your applications randomly with hash(client_addr+port) vi
reuseport_select_sock() in quic_sock_lookup().

It should be easy to do a further match with ALPN between these 3 quic
socks that listens to the same address:port to get the right quic sock,
instead of that randomly choosing.

Ah, that sounds good.

The problem is to parse the TLS Client_Hello message to get the ALPN in
quic_sock_lookup(), which is not a proper thing to do in kernel, and
might be rejected by networking maintainers, I need to check with them.

Is the reassembling of CRYPTO frames done in the kernel or
userspace? Can you point me to the place in the code?
In quic_inq_handshake_tail() in kernel, for Client Initial packet
is processed when calling accept(), this is the path:

quic_accept()-> quic_accept_sock_init() -> quic_packet_process() ->
quic_packet_handshake_process() -> quic_frame_process() ->
quic_frame_crypto_process() -> quic_inq_handshake_tail().

Note that it's with the accept sock, not the listen sock.

If it's really impossible to do in C code maybe
registering a bpf function in order to allow a listener
to check the intial quic packet and decide if it wants to serve
that connection would be possible as last resort?
That's a smart idea! man.
I think the bpf hook in reuseport_select_sock() is meant to do such
selection.

For the Client initial packet (the only packet you need to handle),
I double you will need to do the reassembling, as Client Hello TLS message
is always less than 400 byte in my env.

But I think you need to do the decryption for the Client initial packet
before decoding it then parsing the TLS message from its crypto frame.
I created this patch:

https://github.com/lxin/quic/commit/aee0b7c77df3f39941f98bb901c73fdc560befb8

to do this decryption in quic_sock_look() before calling
reuseport_select_sock(), so that it provides the bpf selector with
a plain-text QUIC initial packet:

https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.2

If it's complex for you to do the decryption for the initial packet in
the bpf selector, I will apply this patch. Please let me know.

I guess in addition to quic_server_handshake(), which is called
after accept(), there should be quic_server_prepare_listen()
(and something similar for in kernel servers) that setup the reuseport
magic for the socket, so that it's not needed in every application.
It's done when calling listen(), see quic_inet_listen()->quic_hash()
where only listening sockets with its sk_reuseport set will be
added into the reuseport group.

It means SO_REUSEPORT sockopt must be set for every socket
before calling listen().

It seems there is only a single ebpf program possible per
reuseport group, so there has to be just a single one.
Yes, a single ebpf program per reuseport group should work.
see prepare_sk_fds() in kernel selftests for select_reuseport bfp.

But is it possible for in kernel servers to also register an epbf program?
Good question. TBH, I don't really know much about epbf programming.
I guess the real problem is how you pass the .o file to kernel space?

Another question is, in the selftests:
tools/testing/selftests/bpf/prog_tests/s
tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c

it created a global reuseport_array, and then added these sockets
into this array for the later lookup, but these sockets are all created
in the same process.

But your case is that the sockets are created in different processes.
I'm not sure if it's possible to add sockets from different processes
into the same reuseport_array?

Added Martin who introduced BPF_PROG_TYPE_SK_REUSEPORT,
I guess he may know the answers.

I didn't read the patchset, so I don't know what wanted to be done.

From capturing the questions in this and next email:

the reuseport_array is a bpf map. Like any bpf map, it can be shared across
different processes. Meaning different processes can add sk to the map.

The bpf prog that selects a sk from the reuseport_array is set by the userspace 
through setsockopt(SO_ATTACH_REUSEPORT_EBPF). It is the only way right now, iirc.

If you can summarize what want to be done, it could help to see if there
are ways that work for the use case.

Thanks.