Re: [LSF/MM/BPF TOPIC] FUSE io_uring zero copy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-01-30 14:05, Bernd Schubert wrote:
> Hi David,
> 
> I would love to participate in this discussion and the page
> migration/tmp-page discussions, but I don't think I can make to to LSF/MM.

Thanks Bernd! Looking forward to discussing this with you.

> 
> On 1/30/25 22:28, David Wei wrote:
>> Hi folks, I want to propose a discussion on adding zero copy to FUSE
>> io_uring in the kernel. The source is some userspace buffer or device
>> memory e.g. GPU VRAM. The destination is FUSE server in userspace, which
>> will then either forward it over the network or to an underlying
>> FS/block device. The FUSE server may want to read the data.
>>
>> My goal is to eliminate copies in this entire data path, including the
>> initial hop between the userspace client and the kernel. I know Ming and
>> Keith are working on adding ublk zero copy but it does not cover this
>> initial hop and it does not allow the ublk/FUSE server to read the data.
>>
>> My idea is to use shared memory or dma-buf, i.e. the source data is
>> encapsulated in an mmap()able fd. The client and FUSE server exchange
>> this fd through a back channel with no kernel involvement. The FUSE
>> server maps the fd into its address space and registers the fd with
>> io_uring via the io_uring_register() infra. When the client does e.g. a
>> DIO write, the pages are pinned and forwarded to FUSE kernel, which does
>> a lookup and understands that the pages belong to the fd that was
>> registered from the FUSE server. Then io_uring tells the FUSE server
>> that the data is in the fd it registered, so there is no need to copy
>> anything at all.
> 
> For specific applications that know the protocol that should.
> 
>>
>> I would like to discuss this and get feedback from the community. My top
>> question is why do this in the kernel at all? It is entirely possible to
>> bypass the kernel entirely by having the client and FUSE server exchange
>> the fd and then do the I/O purely through IPC.
> 
> Because we leave posix and it is rather fuse specific then.

Yeah, good point. Another line of thought is in ease of use from the
client's perspective. Yes, they have to do a back channel IPC with the
FUSE server to do the setup. Though it could be as simple as using one
of the many ways of passing and installing fds between two processes,
e.g. io_uring or SCM_RIGHTS.

But the advantage is that DIO write() is the same as before. The kernel
takes over from that point onwards, all via standard kernel concepts.
Doing it _purely_ in userspace would need completely custom code.

I think this is a useful addition to the kernel and FUSE, that someone
else can make use of without needing to write their own code. If there
is a +1 voice at the conference, that would be a great result and gives
me the confidence to go and build it.

> 
> 
> Thanks,
> Bernd
> 
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux