On Thu, Feb 16, 2023 at 08:46:56AM +0800, Ming Lei wrote: > On Wed, Feb 15, 2023 at 10:27:07AM -0500, Stefan Hajnoczi wrote: > > On Wed, Feb 15, 2023 at 08:51:27AM +0800, Ming Lei wrote: > > > On Mon, Feb 13, 2023 at 02:13:59PM -0500, Stefan Hajnoczi wrote: > > > > On Mon, Feb 13, 2023 at 11:47:31AM +0800, Ming Lei wrote: > > > > > On Wed, Feb 08, 2023 at 07:17:10AM -0500, Stefan Hajnoczi wrote: > > > > > > On Wed, Feb 08, 2023 at 10:12:19AM +0800, Ming Lei wrote: > > > > > > > On Mon, Feb 06, 2023 at 03:27:09PM -0500, Stefan Hajnoczi wrote: > > > > > > > > On Mon, Feb 06, 2023 at 11:00:27PM +0800, Ming Lei wrote: > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > So far UBLK is only used for implementing virtual block device from > > > > > > > > > userspace, such as loop, nbd, qcow2, ...[1]. > > > > > > > > > > > > > > > > I won't be at LSF/MM so here are my thoughts: > > > > > > > > > > > > > > Thanks for the thoughts, :-) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It could be useful for UBLK to cover real storage hardware too: > > > > > > > > > > > > > > > > > > - for fast prototype or performance evaluation > > > > > > > > > > > > > > > > > > - some network storages are attached to host, such as iscsi and nvme-tcp, > > > > > > > > > the current UBLK interface doesn't support such devices, since it needs > > > > > > > > > all LUNs/Namespaces to share host resources(such as tag) > > > > > > > > > > > > > > > > Can you explain this in more detail? It seems like an iSCSI or > > > > > > > > NVMe-over-TCP initiator could be implemented as a ublk server today. > > > > > > > > What am I missing? > > > > > > > > > > > > > > The current ublk can't do that yet, because the interface doesn't > > > > > > > support multiple ublk disks sharing single host, which is exactly > > > > > > > the case of scsi and nvme. > > > > > > > > > > > > Can you give an example that shows exactly where a problem is hit? > > > > > > > > > > > > I took a quick look at the ublk source code and didn't spot a place > > > > > > where it prevents a single ublk server process from handling multiple > > > > > > devices. > > > > > > > > > > > > Regarding "host resources(such as tag)", can the ublk server deal with > > > > > > that in userspace? The Linux block layer doesn't have the concept of a > > > > > > "host", that would come in at the SCSI/NVMe level that's implemented in > > > > > > userspace. > > > > > > > > > > > > I don't understand yet... > > > > > > > > > > blk_mq_tag_set is embedded into driver host structure, and referred by queue > > > > > via q->tag_set, both scsi and nvme allocates tag in host/queue wide, > > > > > that said all LUNs/NSs share host/queue tags, current every ublk > > > > > device is independent, and can't shard tags. > > > > > > > > Does this actually prevent ublk servers with multiple ublk devices or is > > > > it just sub-optimal? > > > > > > It is former, ublk can't support multiple devices which share single host > > > because duplicated tag can be seen in host side, then io is failed. > > > > The kernel sees two independent block devices so there is no issue > > within the kernel. > > This way either wastes memory, or performance is bad since we can't > make a perfect queue depth for each ublk device. > > > > > Userspace can do its own hw tag allocation if there are shared storage > > controller resources (e.g. NVMe CIDs) to avoid duplicating tags. > > > > Have I missed something? > > Please look at lib/sbitmap.c and block/blk-mq-tag.c and see how many > hard issues fixed/reported in the past, and how much optimization done > in this area. > > In theory hw tag allocation can be done in userspace, but just hard to > do efficiently: > > 1) it has been proved as one hard task for sharing data efficiently in > SMP, so don't reinvent wheel in userspace, and this work could take > much more efforts than extending current ublk interface, and just > fruitless > > 2) two times tag allocation slows down io path much > > 2) even worse for userspace allocation, cause task can be killed and > no cleanup is done, so tag leak can be caused easily So then it is not "the former" after all? Stefan
Attachment:
signature.asc
Description: PGP signature