On Fri, 2012-04-20 at 12:09 +0100, Stefan Hajnoczi wrote: > On Fri, Apr 20, 2012 at 8:46 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > Il 20/04/2012 09:00, Nicholas A. Bellinger ha scritto: <SNIP> > > - no support for migration (there can be pending SCSI requests at > > migration time, that need to be restarted on the destination) > > Yes and it hasn't been thought through by me at least ;-). So > migration is indeed a challenge that needs to be worked through. > > > - no support for non-raw images (fix: use NBD on a Unix socket? perhaps > > add an NBD backend to lio) > > For me this is the biggest issue with kernel-level storage for virtual > machines. We have NBD today but it goes through the network stack > using a limited protocol and probably can't do zero-copy. > > The most promising option I found was dm-userspace > (http://wiki.xensource.com/xenwiki/DmUserspace), which implements a > device-mapper target with an in-kernel MMU-like lookup mechanism that > calls out to userspace when block addresses need to be translated. > It's not anywhere near to upstream and hasn't been pushed for several > years. On the plus side we could also write a userspace > implementation of this so that QEMU image formats continue to be > portable to other host OSes without duplicating code. > > If tcm_vhost only works with raw images then I don't see it as a > realistic option given the effort it will require to complete and > maintain. > So there has been interest in the past for creating a TCM backend that allows for a userspace passthrough, but so far the code to do this has not materialized yet.. There are pieces of logic from STGT that provide an interface for doing something similar that still exist in the upstream kernel. Allowing different QEMU formats to be processed (in userspace) through a hybrid TCM backend driver that fits into the existing HBA/DEV layout in /sys/kernel/config/target/$HBA/$DEV/ is what would be required to really do this properly.. > >> In order for QEMU userspace to support this, Linux would need to expose > >> a method to userspace for issuing DIF protected CDBs. This userspace > >> API currently does not exist AFAIK, so a kernel-level approach is the > >> currently the only option when it comes to supporting end-to-end block > >> protection information originating from within Linux guests. > > > > I think it would be worthwhile to have this in userspace too. > > > >> (Note this is going to involve a virtio-scsi spec rev as well) > > > > Yes. By the way, another possible modification could be to tell the > > guest what is its (initiator) WWPN. > > Going back to ALUA, I'd like to understand ALUA multipathing a bit > better. I've never played with multipath, hence my questions: > > I have a SAN with multiple controllers and ALUA support - so ALUA > multipathing is possible. Now I want my KVM guests to take advantage > of multipath themselves. Since the LIO target virtualizes the SCSI > bus (the host admin defines LUNs, target ports, and ACLs that do not > have to map 1:1 to the SAN) we also have to implement ALUA in the > virtio-scsi target. The same would be true for QEMU SCSI emulation. > The virtio-scsi (as an SCSI LLD in guest) is using scsi_dh_alua device handler just like any other SCSI driver does. (eg: ALUA is a fabric independent feature) That means there is no special requirements for initiator LLDs to be able to use scsi_dh_alua, other than the target supporting ALUA primitives + NAA IEEE extended registered naming to identify the backend device across multiple paths. This also currently requires explict multipathd.conf setup (in the guest) if the target LUNs vendor/product strings do not match the default supported ALUA array list in upstream scsi_dh_alua.c code. > How would we configure LIO's ALUA in this case? We really want to > reflect the port attributes (available/offline, > optimized/non-optimized) that the external SAN fabric reports. Is > this supported by LIO? > Absolutely. The ability to set ALUA primary access state comes for free with all fabric modules using TCM + virtual backends (BLOCK+FILEIO). The ALUA status appear as attributes under each endpoint LUN under: /sys/kernel/config/target/vhost/naa.60014050088ae39a/tpgt_1/lun/lun_0/alua_tg_pt_* The 'alua_tg_pt_gp' attr is used to optionally set the fabric LUN ALUA target port group membership. Each fabric target LUN is (by default) associated with an alua_tg_pt_gp that is specific to the exported device backend. Each backend device can have any number of ALUA tg_pt_gps that exist in a configfs group under /sys/kernel/config/target/$HBA/$DEV/alua/$TG_PT_GP_NAME. Here is an quick idea of how an 'default_tg_pt_gp' looks for an IBLOCK device with multiple fabric exports (iscsi, loopback, vhost) # head /sys/kernel/config/target/core/iblock_0/mpt_fusion/alua/default_tg_pt_gp/* ==> alua_access_state <== 0 ==> alua_access_status <== None ==> alua_access_type <== Implict and Explict ==> alua_write_metadata <== 1 ==> members <== iSCSI/iqn.2003-01.org.linux-iscsi.debian-amd64.x8664:sn.6747a471775f/tpgt_1/lun_1 iSCSI/iqn.2003-01.org.linux-iscsi.debian-amd64.x8664:sn.1bc6fcb58f24/tpgt_1/lun_0 loopback/naa.6001405df1bafb29/tpgt_1/lun_0 vhost/naa.60014050088ae39a/tpgt_1/lun_0 ==> nonop_delay_msecs <== 100 ==> preferred <== 0 ==> tg_pt_gp_id <== 0 ==> trans_delay_msecs <== 0 Each ALUA $TG_PT_GP_NAME's members (eg: the exported fabric LUNs) are required to have the same ALUA primary access state following SPC-4 for supporting ALUA target port groups. So when ALUA primary access state is changed at the backend level, it applys to all fabric LUNs within the associated ALUA target port group. There is also an secondary ALUA access state (offline) that can also be set using the an generic fabric LUN ALUA attr. This information is saved into individual files that allow the active state to persist across target power loss. > Does it even make sense to pass the multipathing up into the guest? > If we terminate it on the host using Linux's ALUA support, we can hide > multipath entirely from the guest. Do we lose an obvious advantage by > terminating multipath in the host instead of guest? > Being able to virtualize ALUA port access states at the host (Preferred=1, Active/NonOptimized, Standby) provides a nice fabric independent (and guest OS independent) method for managing path access ifor virtio-scsi guest LUNs. Being able to multiplex I/O to a single vhost-scsi LUN across multiple vhost interrupt pairs is also likely to be difficult when terminating multipath in at the host level.. --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html