On Thu, Jul 1, 2021 at 9:56 AM Christoph Hellwig <hch@xxxxxx> wrote: > On Mon, Jun 28, 2021 at 05:15:58PM +0200, mwilck@xxxxxxxx wrote: > > The qemu "pr-helper" was specifically invented for it. I > > believe that this is the most important real-world scenario for sending > > SG_IO ioctls to device-mapper devices. > > pr-helper obviously does not SG_IO on dm-multipath as that simply does > not work. Right, for the specific case of persistent reservation ioctls, SG_IO is sent manually to each path via libmpathpersist. Failover for SG_IO is needed for general-purpose commands (ranging from INQUIRY/READ CAPACITY to READ/WRITE). The reason to use SG_IO instead of syscalls is mostly to preserve sense data; QEMU does have code to convert errno to sense data, but it's fickle. If QEMU can use sense data directly, it's easier to forward conditions that the guest can resolve itself (for example unit attentions) and to let the guest operate at a lower level (e.g. host-managed ZBC can be forwarded and they just work). Of course, all this works only for SCSI. As NVMe becomes more common, and Linux exposes more functionality to userspace with a fabric-neutral API, QEMU's SBC emulation can start using that functionality and provide low-level passthrough functionality no matter if the host is using SCSI or NVMe. Again, the main obstacle for this is sense data; for example, the SCSI subsystem rightfully eats unit attentions and converts them to uevents if you go through read/write requests instead of SG_IO. > More importantly - if you want to use persistent reservations use the > kernel ioctls for that. These work on SCSI, NVMe and device mapper > without any extra magic. If they provide functionality equivalent to libmpathpersist without having to do the DM_TABLE_STATUS, I will certainly consider switching! The only possible issue could be the lost unit attentions. Paolo > Failing over SG_IO does not make sense. It is an interface specically > designed to leave all error handling to the userspace program using it, > and we should not change that for one specific error case. If you > want the kernel to handle errors for you, use the proper interfaces. > In this case this is the persistent reservation ioctls. If they miss > some features that qemu needs we should add those.