On Fri, Apr 20, 2012 at 8:46 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > Il 20/04/2012 09:00, Nicholas A. Bellinger ha scritto: >> On Thu, 2012-04-19 at 19:20 -0500, Anthony Liguori wrote: >>> TCM runs in the absolute most privileged context possible. When you're dealing >>> with extremely hostile input, it's pretty obvious that you want to run it in the >>> lowest privileged context as humanly possible. >> >> The argument that a SCSI target for virtual machines is so complex that >> it can't possibly be implemented properly in the kernel is a bunch of >> non-sense. > > I agree. A VM is not any more hostile than another iSCSI initiator. > lio _always_ must assume to operates in a hostile environment. > >> Being able to identify which virtio-scsi guests can actually connect via >> vhost-scsi into individual tcm_vhost endpoints is step one here. > > Yes, the ACL system in lio is quite good for this. > >> Well, using a raw device from userspace there is still going to be a >> SG-IO memcpy going on here between user <-> kernel in current code, >> yes..? >> >> Being able to deliver interrupts and SGL memory directly into tcm_vhost >> cmwq kernel context for backend device execution w/o QEMU userspace >> involvement or extra SGL memcpy is the perceived performance benefit >> here. >> >> How much benefit will this actually provide across single port and multi >> port tcm_vhost LUNs into a single guest..? That still remains to be >> demonstrated with performance+throughput benchmarks.. > > Yes, this is the key. The overall goal is for virtio-scsi to compete with or be faster than virtio-blk, whether we go the tcm_vhost or the QEMU SCSI emulation route. So Cong and I discussed the details of such a benchmark yesterday. The results will be presented to the QEMU community when they have been collected - maybe a topic for the KVM community call. > The problems I have with vhost-scsi are, from easiest to hardest: > > - completely different configuration mechanism with respect to the > in-QEMU target (fix: need to integrate configfs into scsi-{disk,generic}). Why is this a problem? The target is a lot richer than QEMU's SCSI emulation. All the ACLs and other configuration should be done using RTSadmin or configfs. I don't think it makes sense to duplicate that into QEMU. > - no support for migration (there can be pending SCSI requests at > migration time, that need to be restarted on the destination) Yes and it hasn't been thought through by me at least ;-). So migration is indeed a challenge that needs to be worked through. > - no support for non-raw images (fix: use NBD on a Unix socket? perhaps > add an NBD backend to lio) For me this is the biggest issue with kernel-level storage for virtual machines. We have NBD today but it goes through the network stack using a limited protocol and probably can't do zero-copy. The most promising option I found was dm-userspace (http://wiki.xensource.com/xenwiki/DmUserspace), which implements a device-mapper target with an in-kernel MMU-like lookup mechanism that calls out to userspace when block addresses need to be translated. It's not anywhere near to upstream and hasn't been pushed for several years. On the plus side we could also write a userspace implementation of this so that QEMU image formats continue to be portable to other host OSes without duplicating code. If tcm_vhost only works with raw images then I don't see it as a realistic option given the effort it will require to complete and maintain. >> In order for QEMU userspace to support this, Linux would need to expose >> a method to userspace for issuing DIF protected CDBs. This userspace >> API currently does not exist AFAIK, so a kernel-level approach is the >> currently the only option when it comes to supporting end-to-end block >> protection information originating from within Linux guests. > > I think it would be worthwhile to have this in userspace too. > >> (Note this is going to involve a virtio-scsi spec rev as well) > > Yes. By the way, another possible modification could be to tell the > guest what is its (initiator) WWPN. Going back to ALUA, I'd like to understand ALUA multipathing a bit better. I've never played with multipath, hence my questions: I have a SAN with multiple controllers and ALUA support - so ALUA multipathing is possible. Now I want my KVM guests to take advantage of multipath themselves. Since the LIO target virtualizes the SCSI bus (the host admin defines LUNs, target ports, and ACLs that do not have to map 1:1 to the SAN) we also have to implement ALUA in the virtio-scsi target. The same would be true for QEMU SCSI emulation. How would we configure LIO's ALUA in this case? We really want to reflect the port attributes (available/offline, optimized/non-optimized) that the external SAN fabric reports. Is this supported by LIO? Does it even make sense to pass the multipathing up into the guest? If we terminate it on the host using Linux's ALUA support, we can hide multipath entirely from the guest. Do we lose an obvious advantage by terminating multipath in the host instead of guest? Stefan -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html