On 2023-11-07 15:03, Mina Almasry wrote: > On Tue, Nov 7, 2023 at 2:55 PM David Ahern <dsahern@xxxxxxxxxx> wrote: >> >> On 11/7/23 3:10 PM, Mina Almasry wrote: >>> On Mon, Nov 6, 2023 at 3:44 PM David Ahern <dsahern@xxxxxxxxxx> wrote: >>>> >>>> On 11/5/23 7:44 PM, Mina Almasry wrote: >>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>> index eeeda849115c..1c351c138a5b 100644 >>>>> --- a/include/linux/netdevice.h >>>>> +++ b/include/linux/netdevice.h >>>>> @@ -843,6 +843,9 @@ struct netdev_dmabuf_binding { >>>>> }; >>>>> >>>>> #ifdef CONFIG_DMA_SHARED_BUFFER >>>>> +struct page_pool_iov * >>>>> +netdev_alloc_devmem(struct netdev_dmabuf_binding *binding); >>>>> +void netdev_free_devmem(struct page_pool_iov *ppiov); >>>> >>>> netdev_{alloc,free}_dmabuf? >>>> >>> >>> Can do. >>> >>>> I say that because a dmabuf can be host memory, at least I am not aware >>>> of a restriction that a dmabuf is device memory. >>>> >>> >>> In my limited experience dma-buf is generally device memory, and >>> that's really its use case. CONFIG_UDMABUF is a driver that mocks >>> dma-buf with a memfd which I think is used for testing. But I can do >>> the rename, it's more clear anyway, I think. >> >> config UDMABUF >> bool "userspace dmabuf misc driver" >> default n >> depends on DMA_SHARED_BUFFER >> depends on MEMFD_CREATE || COMPILE_TEST >> help >> A driver to let userspace turn memfd regions into dma-bufs. >> Qemu can use this to create host dmabufs for guest framebuffers. >> >> >> Qemu is just a userspace process; it is no way a special one. >> >> Treating host memory as a dmabuf should radically simplify the io_uring >> extension of this set. > > I agree actually, and I was about to make that comment to David Wei's > series once I have the time. > > David, your io_uring RX zerocopy proposal actually works with devmem > TCP, if you're inclined to do that instead, what you'd do roughly is > (I think): > > - Allocate a memfd, > - Use CONFIG_UDMABUF to create a dma-buf out of that memfd. > - Bind the dma-buf to the NIC using the netlink API in this RFC. > - Your io_uring extensions and io_uring uapi should work as-is almost > on top of this series, I think. > > If you do this the incoming packets should land into your memfd, which > may or may not work for you. In the future if you feel inclined to use > device memory, this approach that I'm describing here would be more > extensible to device memory, because you'd already be using dma-bufs > for your user memory; you'd just replace one kind of dma-buf (UDMABUF) > with another. > How would TCP devmem change if we no longer assume that dmabuf is device memory? Pavel will know more on the perf side, but I wouldn't want to put any if/else on the hot path if we can avoid it. I could be wrong, but right now in my mind using different memory providers solves this neatly and the driver/networking stack doesn't need to care. Mina, I believe you said at NetDev conf that you already had an udmabuf implementation for testing. I would like to see this (you can send privately) to see how TCP devmem would handle both user memory and device memory. >> That the io_uring set needs to dive into >> page_pools is just wrong - complicating the design and code and pushing >> io_uring into a realm it does not need to be involved in. >> >> Most (all?) of this patch set can work with any memory; only device >> memory is unreadable. >> >> > >