On Wed, Jun 12, 2024 at 03:40:14PM GMT, Bernd Schubert wrote: > On 6/12/24 16:19, Kent Overstreet wrote: > > On Wed, Jun 12, 2024 at 03:53:42PM GMT, Bernd Schubert wrote: > >> I will definitely look at it this week. Although I don't like the idea > >> to have a new kthread. We already have an application thread and have > >> the fuse server thread, why do we need another one? > > > > Ok, I hadn't found the fuse server thread - that should be fine. > > > >>> > >>> The next thing I was going to look at is how you guys are using splice, > >>> we want to get away from that too. > >> > >> Well, Ming Lei is working on that for ublk_drv and I guess that new approach > >> could be adapted as well onto the current way of io-uring. > >> It _probably_ wouldn't work with IORING_OP_READV/IORING_OP_WRITEV. > >> > >> https://lore.gnuweeb.org/io-uring/20240511001214.173711-6-ming.lei@xxxxxxxxxx/T/ > >> > >>> > >>> Brian was also saying the fuse virtio_fs code may be worth > >>> investigating, maybe that could be adapted? > >> > >> I need to check, but really, the majority of the new additions > >> is just to set up things, shutdown and to have sanity checks. > >> Request sending/completing to/from the ring is not that much new lines. > > > > What I'm wondering is how read/write requests are handled. Are the data > > payloads going in the same ringbuffer as the commands? That could work, > > if the ringbuffer is appropriately sized, but alignment is a an issue. > > That is exactly the big discussion Miklos and I have. Basically in my > series another buffer is vmalloced, mmaped and then assigned to ring entries. > Fuse meta headers and application payload goes into that buffer. > In both kernel/userspace directions. io-uring only allows 80B, so only a > really small request would fit into it. Well, the generic ringbuffer would lift that restriction. > Legacy /dev/fuse has an alignment issue as payload follows directly as the fuse > header - intrinsically fixed in the ring patches. *nod* That's the big question, put the data inline (with potential alignment hassles) or manage (and map) a separate data structure. Maybe padding could be inserted to solve alignment? A separate data structure would only really be useful if it enabled zero copy, but that should probably be a secondary enhancement. > I will now try without mmap and just provide a user buffer as pointer in the 80B > section. > > > > > > We just looked up the device DMA requirements and with modern NVME only > > 4 byte alignment is required, but the block layer likely isn't set up to > > handle that. > > I think existing fuse headers have and their data have a 4 byte alignment. > Maybe even 8 byte, I don't remember without looking through all request types. > If you try a simple O_DIRECT read/write to libfuse/example_passthrough_hp > without the ring patches it will fail because of alignment. Needs to be fixed > in legacy fuse and would also avoid compat issues we had in libfuse when the > kernel header was updated. > > > > > So - prearranged buffer? Or are you using splice to get pages that > > userspace has read into into the kernel pagecache? > > I didn't even try to use splice yet, because for the DDN (my employer) use case > we cannot use zero copy, at least not without violating the rule that one > cannot access the application buffer in userspace. DDN - lustre related? > > I will definitely look into Mings work, as it will be useful for others. > > > Cheers, > Bernd