On Fri, Apr 19, 2024 at 10:07 AM Stefan Metzmacher <metze@xxxxxxxxx> wrote: > > Hi Xin Long, > > >>>>>> first many thanks for working on this topic! > >>>>>> > >>>>> Hi, Stefan > >>>>> > >>>>> Thanks for the comment! > >>>>> > >>>>>>> Usage > >>>>>>> ===== > >>>>>>> > >>>>>>> This implementation supports a mapping of QUIC into sockets APIs. Similar > >>>>>>> to TCP and SCTP, a typical Server and Client use the following system call > >>>>>>> sequence to communicate: > >>>>>>> > >>>>>>> Client Server > >>>>>>> ------------------------------------------------------------------ > >>>>>>> sockfd = socket(IPPROTO_QUIC) listenfd = socket(IPPROTO_QUIC) > >>>>>>> bind(sockfd) bind(listenfd) > >>>>>>> listen(listenfd) > >>>>>>> connect(sockfd) > >>>>>>> quic_client_handshake(sockfd) > >>>>>>> sockfd = accecpt(listenfd) > >>>>>>> quic_server_handshake(sockfd, cert) > >>>>>>> > >>>>>>> sendmsg(sockfd) recvmsg(sockfd) > >>>>>>> close(sockfd) close(sockfd) > >>>>>>> close(listenfd) > >>>>>>> > >>>>>>> Please note that quic_client_handshake() and quic_server_handshake() functions > >>>>>>> are currently sourced from libquic in the github lxin/quic repository, and might > >>>>>>> be integrated into ktls-utils in the future. These functions are responsible for > >>>>>>> receiving and processing the raw TLS handshake messages until the completion of > >>>>>>> the handshake process. > >>>>>> > >>>>>> I see a problem with this design for the server, as one reason to > >>>>>> have SMB over QUIC is to use udp port 443 in order to get through > >>>>>> firewalls. As QUIC has the concept of ALPN it should be possible > >>>>>> let a conumer only listen on a specif ALPN, so that the smb server > >>>>>> and web server on "h3" could both accept connections. > >>>>> We do provide a sockopt to set ALPN before bind or handshaking: > >>>>> > >>>>> https://github.com/lxin/quic/wiki/man#quic_sockopt_alpn > >>>>> > >>>>> But it's used more like to verify if the ALPN set on the server > >>>>> matches the one received from the client, instead of to find > >>>>> the correct server. > >>>> > >>>> Ah, ok. > >>> Just note that, with a bit change in the current libquic, it still > >>> allows users to use ALPN to find the correct function or thread in > >>> the *same* process, usage be like: > >>> > >>> listenfd = socket(IPPROTO_QUIC); > >>> /* match all during handshake with wildcard ALPN */ > >>> setsockopt(listenfd, QUIC_SOCKOPT_ALPN, "*"); > >>> bind(listenfd) > >>> listen(listenfd) > >>> > >>> while (1) { > >>> sockfd = accept(listenfd); > >>> /* the alpn from client will be set to sockfd during handshake */ > >>> quic_server_handshake(sockfd, cert); > >>> > >>> getsockopt(sockfd, QUIC_SOCKOPT_ALPN, alpn); > >> > >> Would quic_server_handshake() call setsockopt()? > > Yes, I just made a bit change in the userspace libquic: > > > > https://github.com/lxin/quic/commit/9c75bd42769a8cbc1652e2f4c8d77780f23afde6 > > > > So you can set up multple ALPNs on listen sock: > > > > setsockopt(listenfd, QUIC_SOCKOPT_ALPN, "smbd, h3, ksmbd"); > > > > Then during handshake, the matched ALPN from client will be set into > > the accept socket, then users can get it later after handshake. > > > > Note that userspace libquic is a very light lib (a couple of hundred lines > > of code), you can add more TLS related support without touching Kernel code, > > including the SNI support you mentioned. > > > >> > >>> switch (alpn) { > >>> case "smbd": smbd_thread(sockfd); > >>> case "h3": h3_thread(sockfd); > >>> case "ksmbd": ksmbd_thread(sockfd); > >>> } > >>> } > >> > >> Ok, but that would mean all application need to be aware of each other, > >> but it would be possible and socket fds could be passed to other > >> processes. > > It doesn't sound common to me, but yes, I think Unix Domain Sockets > > can pass it to another process. > > I think it will be extremely common to have multiple services > based on udp port 443. > > People will expect to find web services, smb and maybe more > behind the same dnshost name. And multiple dnshostnames pointing > to the same ip address is also very likely. > > With plain tcp/udp it's also possible to independent sockets > per port. There's no single userspace daemon that listens on > 'tcp' and will dispatch into different process base on the port. > > And with QUIC the port space is the ALPN and/or SNI > combination. > > And I think this should be addressed before this becomes an > unchangeable kernel ABI, written is stone. > > >>>>> So you expect (k)smbd server and web server both to listen on UDP > >>>>> port 443 on the same host, and which APP server accepts the request > >>>>> from a client depends on ALPN, right? > >>>> > >>>> yes. > >>> Got you. This can be done by also moving TLS 1.3 message exchange to > >>> kernel where we can get the ALPN before looking up the listening socket. > >>> However, In-kernel TLS 1.3 Handshake had been NACKed by both kernel > >>> netdev maintainers and userland ssl lib developers with good reasons. > >>> > >>>> > >>>>> Currently, in Kernel, this implementation doesn't process any raw TLS > >>>>> MSG/EXTs but deliver them to userspace after decryption, and the accept > >>>>> socket is created before processing handshake. > >>>>> > >>>>> I'm actually curious how userland QUIC handles this, considering > >>>>> that the UDP sockets('listening' on the same IP:PORT) are used in > >>>>> two different servers' processes. I think socket lookup with ALPN > >>>>> has to be done in Kernel Space. Do you know any userland QUIC > >>>>> implementation for this? > >>>> > >>>> I don't now, but I guess QUIC is only used for http so > >>>> far and maybe dns, but that seems to use port 853. > >>>> > >>>> So there's no strict need for it and the web server > >>>> would handle all relevant ALPNs. > >>> Honestly, I don't think any userland QUIC can use ALPN to lookup for > >>> different sockets used by different servers/processes. As such thing > >>> can be only done in Kernel Space. > >>> > >>>> > >>>>>> > >>>>>> So the server application should have a way to specify the desired > >>>>>> ALPN before or during the bind() call. I'm not sure if the > >>>>>> ALPN is available in cleartext before any crypto is needed, > >>>>>> so if the ALPN is encrypted it might be needed to also register > >>>>>> a server certificate and key together with the ALPN. > >>>>>> Because multiple application may not want to share the same key. > >>>>> On send side, ALPN extension is in raw TLS messages created in userspace > >>>>> and passed into the kernel and encoded into QUIC crypto frame and then > >>>>> *encrypted* before sending out. > >>>> > >>>> Ok. > >>>> > >>>>> On recv side, after decryption, the raw TLS messages are decoded from > >>>>> the QUIC crypto frame and then delivered to userspace, so in userspace > >>>>> it processes certificate validation and also see cleartext ALPN. > >>>>> > >>>>> Let me know if I don't make it clear. > >>>> > >>>> But the first "new" QUIC pdu from will trigger the accept() to > >>>> return and userspace (or the kernel helper function) will to > >>>> all crypto? Or does the first decryption happen in kernel (before accept returns)? > >>> Good question! > >>> > >>> The first "new" QUIC pdu will cause to create a 'request sock' (contains > >>> 4-tuple and connection IDs only) and queue up to reqsk list of the listen > >>> sock (if validate_peer_address param is not set), and this pdu is enqueued > >>> in the inq->backlog_list of the listen sock. > >>> > >>> When accept() is called, in Kernel, it dequeues the "request sock" from the > >>> reqsk list of the listen sock, and creates the accept socket based on this > >>> reqsk. Then it processes the pdu for this new accept socket from the > >>> inq->backlog_list of the listen sock, including *decrypting* QUIC packet > >>> and decoding CRYPTO frame, then deliver the raw/cleartext TLS message to > >>> the Userspace libquic. > >> > >> Ok, when the kernel already decrypts it could already > >> look find the ALPN. It doesn't mean it should do the full > >> handshake, but parse enough to find the ALPN. > > Correct, in-kernel QUIC should only do the QUIC related things, > > and all TLS handshake msgs must be handled in Userspace. > > This won't cause "layering violation", as Nick Banks said. > > But I think its unavoidable for the ALPN and SNI fields on > the server side. As every service tries to use udp port 443 > and somehow that needs to be shared if multiple services want to > use it. > > I guess on the acceptor side we would need to somehow detach low level > udp struct sock from the logical listen struct sock. > > And quic_do_listen_rcv() would need to find the correct logical listening > socket and call quic_request_sock_enqueue() on the logical socket > not the lowlevel udo socket. The same for all stuff happening after > quic_request_sock_enqueue() at the end of quic_do_listen_rcv. > The implementation allows one low level UDP sock to serve for multiple QUIC socks. Currently, if your 3 quic applications listen to the same address:port with SO_REUSEPORT socket option set, the incoming connection will choose one of your applications randomly with hash(client_addr+port) via reuseport_select_sock() in quic_sock_lookup(). It should be easy to do a further match with ALPN between these 3 quic socks that listens to the same address:port to get the right quic sock, instead of that randomly choosing. The problem is to parse the TLS Client_Hello message to get the ALPN in quic_sock_lookup(), which is not a proper thing to do in kernel, and might be rejected by networking maintainers, I need to check with them. Will you be able to work around this by using Unix Domain Sockets pass the sockfd to another process? (Note that we're assuming all your 3 applications are using in-kernel QUIC) > >> But I don't yet understand how the kernel gets the key to > >> do the initlal decryption, I'd assume some call before listen() > >> need to tell the kernel about the keys. > > For initlal decryption, the keys can be derived with the initial packet. > > basically, it only needs the dst_connection_id from the client initial > > packet. see: > > > > https://datatracker.ietf.org/doc/html/rfc9001#name-initial-secrets > > > > so we don't need to set up anything to kernel for initial's keys. > > I got it thanks! > > metze >