Re: [PATCH net-next v11 00/21] io_uring zero copy rx

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/17/25 3:42 PM, Pavel Begunkov wrote:
> On 1/17/25 14:28, Paolo Abeni wrote:
>> On 1/17/25 12:16 AM, David Wei wrote:
>>> This patchset adds support for zero copy rx into userspace pages using
>>> io_uring, eliminating a kernel to user copy.
>>>
>>> We configure a page pool that a driver uses to fill a hw rx queue to
>>> hand out user pages instead of kernel pages. Any data that ends up
>>> hitting this hw rx queue will thus be dma'd into userspace memory
>>> directly, without needing to be bounced through kernel memory. 'Reading'
>>> data out of a socket instead becomes a _notification_ mechanism, where
>>> the kernel tells userspace where the data is. The overall approach is
>>> similar to the devmem TCP proposal.
>>>
>>> This relies on hw header/data split, flow steering ad RSS to ensure
>>> packet headers remain in kernel memory and only desired flows hit a hw
>>> rx queue configured for zero copy. Configuring this is outside of the
>>> scope of this patchset.
>>>
>>> We share netdev core infra with devmem TCP. The main difference is that
>>> io_uring is used for the uAPI and the lifetime of all objects are bound
>>> to an io_uring instance. Data is 'read' using a new io_uring request
>>> type. When done, data is returned via a new shared refill queue. A zero
>>> copy page pool refills a hw rx queue from this refill queue directly. Of
>>> course, the lifetime of these data buffers are managed by io_uring
>>> rather than the networking stack, with different refcounting rules.
>>>
>>> This patchset is the first step adding basic zero copy support. We will
>>> extend this iteratively with new features e.g. dynamically allocated
>>> zero copy areas, THP support, dmabuf support, improved copy fallback,
>>> general optimisations and more.
>>>
>>> In terms of netdev support, we're first targeting Broadcom bnxt. Patches
>>> aren't included since Taehee Yoo has already sent a more comprehensive
>>> patchset adding support in [1]. Google gve should already support this,
>>> and Mellanox mlx5 support is WIP pending driver changes.
>>>
>>> ===========
>>> Performance
>>> ===========
>>>
>>> Note: Comparison with epoll + TCP_ZEROCOPY_RECEIVE isn't done yet.
>>>
>>> Test setup:
>>> * AMD EPYC 9454
>>> * Broadcom BCM957508 200G
>>> * Kernel v6.11 base [2]
>>> * liburing fork [3]
>>> * kperf fork [4]
>>> * 4K MTU
>>> * Single TCP flow
>>>
>>> With application thread + net rx softirq pinned to _different_ cores:
>>>
>>> +-------------------------------+
>>> | epoll     | io_uring          |
>>> |-----------|-------------------|
>>> | 82.2 Gbps | 116.2 Gbps (+41%) |
>>> +-------------------------------+
>>>
>>> Pinned to _same_ core:
>>>
>>> +-------------------------------+
>>> | epoll     | io_uring          |
>>> |-----------|-------------------|
>>> | 62.6 Gbps | 80.9 Gbps (+29%)  |
>>> +-------------------------------+
>>>
>>> =====
>>> Links
>>> =====
>>>
>>> Broadcom bnxt support:
>>> [1]: https://lore.kernel.org/netdev/20241003160620.1521626-8-ap420073@xxxxxxxxx/
>>>
>>> Linux kernel branch:
>>> [2]: https://github.com/spikeh/linux.git zcrx/v9
>>>
>>> liburing for testing:
>>> [3]: https://github.com/isilence/liburing.git zcrx/next
>>>
>>> kperf for testing:
>>> [4]: https://git.kernel.dk/kperf.git
>>
>> We are getting very close to the merge window. In order to get this
>> series merged before such deadline the point raised by Jakub on this
>> version must me resolved, the next iteration should land to the ML
>> before the end of the current working day and the series must apply
>> cleanly to net-next, so that it can be processed by our CI.
> 
> Sounds good, thanks Paolo.
> 
> Since the merging is not trivial, I'll send a PR for the net/
> patches instead of reposting the entire thing, if that sounds right
> to you. The rest will be handled on the io_uring side.

I agree it is the more straight-forward path. @Jakub: do you see any
problem with the above?

/P






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux