Re: [PATCH RFC v2 00/19] fuse: fuse-over-io-uring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 12, 2024 at 03:40:14PM GMT, Bernd Schubert wrote:
> On 6/12/24 16:19, Kent Overstreet wrote:
> > On Wed, Jun 12, 2024 at 03:53:42PM GMT, Bernd Schubert wrote:
> >> I will definitely look at it this week. Although I don't like the idea
> >> to have a new kthread. We already have an application thread and have
> >> the fuse server thread, why do we need another one?
> > 
> > Ok, I hadn't found the fuse server thread - that should be fine.
> > 
> >>>
> >>> The next thing I was going to look at is how you guys are using splice,
> >>> we want to get away from that too.
> >>
> >> Well, Ming Lei is working on that for ublk_drv and I guess that new approach
> >> could be adapted as well onto the current way of io-uring.
> >> It _probably_ wouldn't work with IORING_OP_READV/IORING_OP_WRITEV.
> >>
> >> https://lore.gnuweeb.org/io-uring/20240511001214.173711-6-ming.lei@xxxxxxxxxx/T/
> >>
> >>>
> >>> Brian was also saying the fuse virtio_fs code may be worth
> >>> investigating, maybe that could be adapted?
> >>
> >> I need to check, but really, the majority of the new additions
> >> is just to set up things, shutdown and to have sanity checks.
> >> Request sending/completing to/from the ring is not that much new lines.
> > 
> > What I'm wondering is how read/write requests are handled. Are the data
> > payloads going in the same ringbuffer as the commands? That could work,
> > if the ringbuffer is appropriately sized, but alignment is a an issue.
> 
> That is exactly the big discussion Miklos and I have. Basically in my
> series another buffer is vmalloced, mmaped and then assigned to ring entries.
> Fuse meta headers and application payload goes into that buffer.
> In both kernel/userspace directions. io-uring only allows 80B, so only a
> really small request would fit into it.

Well, the generic ringbuffer would lift that restriction.

> Legacy /dev/fuse has an alignment issue as payload follows directly as the fuse
> header - intrinsically fixed in the ring patches.

*nod*

That's the big question, put the data inline (with potential alignment
hassles) or manage (and map) a separate data structure.

Maybe padding could be inserted to solve alignment?

A separate data structure would only really be useful if it enabled zero
copy, but that should probably be a secondary enhancement.

> I will now try without mmap and just provide a user buffer as pointer in the 80B
> section.
> 
> 
> > 
> > We just looked up the device DMA requirements and with modern NVME only
> > 4 byte alignment is required, but the block layer likely isn't set up to
> > handle that.
> 
> I think existing fuse headers have and their data have a 4 byte alignment.
> Maybe even 8 byte, I don't remember without looking through all request types.
> If you try a simple O_DIRECT read/write to libfuse/example_passthrough_hp
> without the ring patches it will fail because of alignment. Needs to be fixed
> in legacy fuse and would also avoid compat issues we had in libfuse when the
> kernel header was updated.
> 
> > 
> > So - prearranged buffer? Or are you using splice to get pages that
> > userspace has read into into the kernel pagecache?
> 
> I didn't even try to use splice yet, because for the DDN (my employer) use case
> we cannot use  zero copy, at least not without violating the rule that one
> cannot access the application buffer in userspace.

DDN - lustre related?

> 
> I will definitely look into Mings work, as it will be useful for others.
> 
> 
> Cheers,
> Bernd




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux