On Mon, Nov 6, 2023 at 2:56 PM Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote: > > On Mon, Nov 6, 2023 at 2:34 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > > > On 11/06, Willem de Bruijn wrote: > > > > > IMHO, we need a better UAPI to receive the tokens and give them back to > > > > > the kernel. CMSG + setsockopt(SO_DEVMEM_DONTNEED) get the job done, > > > > > but look dated and hacky :-( > > > > > > > > > > We should either do some kind of user/kernel shared memory queue to > > > > > receive/return the tokens (similar to what Jonathan was doing in his > > > > > proposal?) > > > > > > > > I'll take a look at Jonathan's proposal, sorry, I'm not immediately > > > > familiar but I wanted to respond :-) But is the suggestion here to > > > > build a new kernel-user communication channel primitive for the > > > > purpose of passing the information in the devmem cmsg? IMHO that seems > > > > like an overkill. Why add 100-200 lines of code to the kernel to add > > > > something that can already be done with existing primitives? I don't > > > > see anything concretely wrong with cmsg & setsockopt approach, and if > > > > we switch to something I'd prefer to switch to an existing primitive > > > > for simplicity? > > > > > > > > The only other existing primitive to pass data outside of the linear > > > > buffer is the MSG_ERRQUEUE that is used for zerocopy. Is that > > > > preferred? Any other suggestions or existing primitives I'm not aware > > > > of? > > > > > > > > > or bite the bullet and switch to io_uring. > > > > > > > > > > > > > IMO io_uring & socket support are orthogonal, and one doesn't preclude > > > > the other. As you know we like to use sockets and I believe there are > > > > issues with io_uring adoption at Google that I'm not familiar with > > > > (and could be wrong). I'm interested in exploring io_uring support as > > > > a follow up but I think David Wei will be interested in io_uring > > > > support as well anyway. > > > > > > I also disagree that we need to replace a standard socket interface > > > with something "faster", in quotes. > > > > > > This interface is not the bottleneck to the target workload. > > > > > > Replacing the synchronous sockets interface with something more > > > performant for workloads where it is, is an orthogonal challenge. > > > However we do that, I think that traditional sockets should continue > > > to be supported. > > > > > > The feature may already even work with io_uring, as both recvmsg with > > > cmsg and setsockopt have io_uring support now. > > > > I'm not really concerned with faster. I would prefer something cleaner :-) > > > > Or maybe we should just have it documented. With some kind of path > > towards beautiful world where we can create dynamic queues.. > > I suppose we just disagree on the elegance of the API. Yeah, I might be overly sensitive to the apis that use get/setsockopt for something more involved than setting a flag. Probably because I know that bpf will (unnecessarily) trigger on these :-D I had to implement that bpf "bypass" (or fastpath) for TCP_ZEROCOPY_RECEIVE and it looks like this token recycle might also benefit from something similar. > The concise notification API returns tokens as a range for > compression, encoding as two 32-bit unsigned integers start + length. > It allows for even further batching by returning multiple such ranges > in a single call. Tangential: should tokens be u64? Otherwise we can't have more than 4gb unacknowledged. Or that's a reasonable constraint? > This is analogous to the MSG_ZEROCOPY notification mechanism from > kernel to user. > > The synchronous socket syscall interface can be replaced by something > asynchronous like io_uring. This already works today? Whatever > asynchronous ring-based API would be selected, io_uring or otherwise, > I think the concise notification encoding would remain as is. > > Since this is an operation on a socket, I find a setsockopt the > fitting interface.