Re: [LSF/MM/BPF TOPIC] block drivers in user space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/14/22 18:04, Mike Christie wrote:
On 3/3/22 1:09 AM, Hannes Reinecke wrote:
On 3/2/22 17:52, Mike Christie wrote:
On 2/21/22 1:59 PM, Gabriel Krisman Bertazi wrote:
I'd like to discuss an interface to implement user space block devices,
while avoiding local network NBD solutions.  There has been reiterated

Besides the tcmu approach, I've also worked on the local nbd based
solution like here:

https://urldefense.com/v3/__https://github.com/gluster/nbd-runner__;!!ACWV5N9M2RV99hQ!YY39rbV9MpaNUtr7ElzgcG1TyPznVEt1yppLwAGkq32-Fw9rQkqB6FzcaHiwIdgXp00K$
Have you looked into a modern take that uses io_uring's socket features
with the zero copy work that's being worked on for it? If so, what are
the issues you have hit with that? Was it mostly issues with the zero
copy part of it?


Problem is that we'd need an _inverse_ io_uring interface.
The current io_uring interface writes submission queue elements,
and waits for completion queue elements.

I'm not sure what you meant here.

io_uring can do recvs right? So userspace nbd would do
IORING_OP_RECVMSG to wait for drivers/block/nbd.c to send userspace
cmds via the local socket. Userspace nbd would do IORING_OP_SENDMSG
to send drivers/block/nbd.c the cmd response.

drivers/block/nbd doesn't know/care what userspace did. It's just
reading/writing from/to the socket.

I was talking about the internal layout of io_uring.
It sets up submission and completion rings, writes the command & data
into the submission rings, and waits for the corresponding completion
to show up on the completion rings.

A userspace block driver would need the inverse; waiting for submissions
to show up in the submission rings, and writing completions into the completion ring.

recvmsg feels awkward here as one would need to write a recvmsg op into the submission ring, get the completion, handle the I/O, write a sendmsg op, wait for the completion.
IE we would double the number of operations.

Sure it's doable, and admittedly doesn't need (much) modifications for io_uring. But still feels like a waste, and we certainly can't reach max performance with that setup.

Cheers,

Hannes
--
Dr. Hannes Reinecke                Kernel Storage Architect
hare@xxxxxxx                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux