On 2020/06/17 15:18, Javier González wrote: > On 17.06.2020 00:38, Damien Le Moal wrote: >> On 2020/06/17 1:13, Javier González wrote: >>> On 16.06.2020 09:07, Keith Busch wrote: >>>> On Tue, Jun 16, 2020 at 05:55:26PM +0200, Javier González wrote: >>>>> On 16.06.2020 08:48, Keith Busch wrote: >>>>>> On Tue, Jun 16, 2020 at 05:02:17PM +0200, Javier González wrote: >>>>>>> This depends very much on how the FS / application is managing >>>>>>> stripping. At the moment our main use case is enabling user-space >>>>>>> applications submitting I/Os to raw ZNS devices through the kernel. >>>>>>> >>>>>>> Can we enable this use case to start with? >>>>>> >>>>>> I think this already provides that. You can set the nsid value to >>>>>> whatever you want in the passthrough interface, so a namespace block >>>>>> device is not required to issue I/O to a ZNS namespace from user space. >>>>> >>>>> Mmmmm. Problem now is that the check on the nvme driver prevents the ZNS >>>>> namespace from being initialized. Am I missing something? >>>> >>>> Hm, okay, it may not work for you. We need the driver to create at least >>>> one namespace so that we have tags and request_queue. If you have that, >>>> you can issue IO to any other attached namespace through the passthrough >>>> interface, but we can't assume there is an available namespace. >>> >>> That makes sense for now. >>> >>> The next step for us is to enable a passthrough on uring, making sure >>> that I/Os do not split. >> >> Passthrough as in "application issues directly NVMe commands" like for SG_IO >> with SCSI ? Or do you mean raw block device file accesses by the application, >> meaning that the IO goes through the block IO stack as opposed to directly going >> to the driver ? >> >> For the latter case, I do not think it is possible to guarantee that an IO will >> not get split unless we are talking about single page IOs (e.g. 4K on X86). See >> a somewhat similar request here and comments about it. >> >> https://www.spinics.net/lists/linux-block/msg55079.html > > At the moment we are doing the former, but it looks like a hack to me to > go directly to the NVMe driver. That is what the nvme driver ioctl() is for no ? An application can send an NVMe command directly to the driver with it. That is not a hack, but the regular way of doing passthrough for NVMe, isn't it ? > I was thinking that we could enable the second path by making use of > chunk_sectors and limit the I/O size just as the append_max_io_size > does. Is this the complete wrong way of looking at it? The block layer cannot limit the size of a passthrough command since the command is protocol specific and the block layer is a protocol independent interface. SCSI SG does not split passthrough requests, it cannot. For passthrough commands, the command buffer can be dma-mapped or it cannot. If mapping succeeds, the command is issued. If it cannot, the command is failed. At least, that is my understanding of how the stack is working. > > Thanks, > Javier > > _______________________________________________ > linux-nvme mailing list > linux-nvme@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/linux-nvme > -- Damien Le Moal Western Digital Research