On Thu, 10 Nov 2022 15:36:34 -0800 Stefan Roesch wrote: > Jakub Kicinski <kuba@xxxxxxxxxx> writes: > > On Mon, 7 Nov 2022 09:52:39 -0800 Stefan Roesch wrote: > >> This adds the napi busy polling support in io_uring.c. It adds a new > >> napi_list to the io_ring_ctx structure. This list contains the list of > >> napi_id's that are currently enabled for busy polling. The list is > >> synchronized by the new napi_lock spin lock. The current default napi > >> busy polling time is stored in napi_busy_poll_to. If napi busy polling > >> is not enabled, the value is 0. > >> > >> The busy poll timeout is also stored as part of the io_wait_queue. This > >> is necessary as for sq polling the poll interval needs to be adjusted > >> and the napi callback allows only to pass in one value. > >> > >> Testing has shown that the round-trip times are reduced to 38us from > >> 55us by enabling napi busy polling with a busy poll timeout of 100us. > > > > What's the test, exactly? What's the network latency? Did you busy poll > > on both ends? > > The test programs are part of the liburing patches. They consist of a > client and server program. The client sends a request, which has a timestamp > in its payload and the server replies with the same payload. The client > calculates the roundtrip time and stores it to calcualte the results. > > The client is running on host1 and the server is running on host 2. The > measured times below are roundtrip times. These are average times over > 10 runs each. > > If no napi busy polling wait is used : 55us > If napi with client busy polling is used : 44us > If napi busy polling is used on the client and server: 38us > > If you think the numbers are not that useful, I can remove them from the > commit message. The latency numbers are a sum of a few components so you'd need to break them down a little further. At least for me. I'd anticipate we'll have networking delay, IRQ/completion coalescing in the NIC, and then SW processing time. I was suspecting you were only busy polling on one end, because the 38us is very close to the default IRQ coalescing "we" have (33us). Simplest way to provide a clear number would be to test with IRQ coal set to 0/1, and back-to-back machines (or within a rack). If that's what you did then just add the info to the msg and the numbers are good :) > >> + list_for_each_entry_safe(ne, n, napi_list, list) { > >> + napi_busy_loop(ne->napi_id, NULL, NULL, true, BUSY_POLL_BUDGET); > > > > You can't opt the user into prefer busy poll without the user asking > > for it. Default to false and add an explicit knob like patch 2. > > > > The above code is from the function io_napi_blocking_busy_loop(). > However this function is only called when a busy poll timeout has been > configured. > > #ifdef CONFIG_NET_RX_BUSY_POLL > if (iowq.busy_poll_to) > io_napi_blocking_busy_loop(&local_napi_list, &iowq); > > However we don't have that check for sqpoll, so we should add a check > for the napi busy poll timeout in __io_sq_thread. > > Do we really need a knob to store if napi busy polling is enabled or is > sufficent to store a napi busy poll timeout value? I was asking about *prefer* busy poll, IOW SO_PREFER_BUSY_POLL. So I'm talking about argument 4 being set to true. This feature requires system configuration to arm timers to the correct values within the netdev code. Normal epoll path always passes false there, IIRC. We can add the support in iouring but we need an opt-in.