On Sat, 2022-03-26 at 14:30 -0700, Jakub Kicinski wrote: > On Sat, 26 Mar 2022 15:06:40 -0600 Jens Axboe wrote: > > On 3/26/22 2:57 PM, Jens Axboe wrote: > > > > I'd also like to have a conversation about continuing to use > > > > the socket as a proxy for NAPI_ID, NAPI_ID is exposed to user > > > > space now. io_uring being a new interface I wonder if it's not > > > > better to let the user specify the request parameters > > > > directly. > > > > > > Definitely open to something that makes more sense, given we > > > don't > > > have to shoehorn things through the regular API for NAPI with > > > io_uring. > > > > The most appropriate is probably to add a way to get/set NAPI > > settings > > on a per-io_uring basis, eg through io_uring_register(2). It's a > > bit > > more difficult if they have to be per-socket, as the polling > > happens off > > what would normally be the event wait path. > > > > What did you have in mind? > > Not sure I fully comprehend what the current code does. IIUC it uses > the socket and the caches its napi_id, presumably because it doesn't > want to hold a reference on the socket? Again, the io_uring napi busy_poll integration is strongly inspired from epoll implementation which caches a single napi_id. I guess that I did reverse engineer the rational justifying the epoll design decisions. If you were to busy poll receive queues for a socket set containing hundreds of thousands of sockets, would you rather scan the whole socket set to retrieve which queues to poll or simple iterate through a list containing a dozen of so of ids? > > This may give the user a false impression that the polling follows > the socket. NAPIs may get reshuffled underneath on pretty random > reconfiguration / recovery events (random == driver dependent). There is nothing random. When a socket is added to the poll set, its receive queue is added to the short list of queues to poll. A very common usage pattern among networking applications it is to reinsert the socket into the polling set after each polling event. In recognition to this pattern and to avoid allocating/deallocating memory to modify the napi_id list all the time, each napi id is kept in the list until a very long period of inactivity is reached where it is finally removed to stop the receive queue busy polling. > > I'm not entirely clear how the thing is supposed to be used with TCP > socket, as from a quick grep it appears that listening sockets don't > get napi_id marked at all. > > The commit mentions a UDP benchmark, Olivier can you point me to more > info on the use case? I'm mostly familiar with NAPI busy poll with > XDP > sockets, where it's pretty obvious. https://github.com/lano1106/io_uring_udp_ping IDK what else I can tell you. I choose to unit test the new feature with an UDP app because it was the simplest setup for testing. AFAIK, the ultimate goal of busy polling is to minimize latency in packets reception and the NAPI busy polling code should not treat differently packets whether they are UDP or TCP or whatever the type of frames the NIC does receive... > > My immediate reaction is that we should either explicitly call out > NAPI > instances by id in uAPI, or make sure we follow the socket in every > case. Also we can probably figure out an easy way of avoiding the > hash > table lookups and cache a pointer to the NAPI struct. > That is an interesting idea. If this is something that NAPI API would offer, I would gladly use that to avoid the hash lookup but IMHO, I see it as a very interesting improvement but hopefully this should not block my patch...