Re: [Qemu-devel] [PATCH 00/16] QEMU vhost-scsi support

Stefan Hajnoczi <stefanha@xxxxxxxxx> · Fri, 20 Apr 2012 12:09:24 +0100

On Fri, Apr 20, 2012 at 8:46 AM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
> Il 20/04/2012 09:00, Nicholas A. Bellinger ha scritto:
>> On Thu, 2012-04-19 at 19:20 -0500, Anthony Liguori wrote:
>>> TCM runs in the absolute most privileged context possible.  When you're dealing
>>> with extremely hostile input, it's pretty obvious that you want to run it in the
>>> lowest privileged context as humanly possible.
>>
>> The argument that a SCSI target for virtual machines is so complex that
>> it can't possibly be implemented properly in the kernel is a bunch of
>> non-sense.
>
> I agree.  A VM is not any more hostile than another iSCSI initiator.
> lio _always_ must assume to operates in a hostile environment.
>
>> Being able to identify which virtio-scsi guests can actually connect via
>> vhost-scsi into individual tcm_vhost endpoints is step one here.
>
> Yes, the ACL system in lio is quite good for this.
>
>> Well, using a raw device from userspace there is still going to be a
>> SG-IO memcpy going on here between user <-> kernel in current code,
>> yes..?
>>
>> Being able to deliver interrupts and SGL memory directly into tcm_vhost
>> cmwq kernel context for backend device execution w/o QEMU userspace
>> involvement or extra SGL memcpy is the perceived performance benefit
>> here.
>>
>> How much benefit will this actually provide across single port and multi
>> port tcm_vhost LUNs into a single guest..?  That still remains to be
>> demonstrated with performance+throughput benchmarks..
>
> Yes, this is the key.

The overall goal is for virtio-scsi to compete with or be faster than
virtio-blk, whether we go the tcm_vhost or the QEMU SCSI emulation
route.  So Cong and I discussed the details of such a benchmark
yesterday.  The results will be presented to the QEMU community when
they have been collected - maybe a topic for the KVM community call.

> The problems I have with vhost-scsi are, from easiest to hardest:
>
> - completely different configuration mechanism with respect to the
> in-QEMU target (fix: need to integrate configfs into scsi-{disk,generic}).

Why is this a problem?  The target is a lot richer than QEMU's SCSI
emulation.  All the ACLs and other configuration should be done using
RTSadmin or configfs.  I don't think it makes sense to duplicate that
into QEMU.

> - no support for migration (there can be pending SCSI requests at
> migration time, that need to be restarted on the destination)

Yes and it hasn't been thought through by me at least ;-).  So
migration is indeed a challenge that needs to be worked through.

> - no support for non-raw images (fix: use NBD on a Unix socket? perhaps
> add an NBD backend to lio)

For me this is the biggest issue with kernel-level storage for virtual
machines.  We have NBD today but it goes through the network stack
using a limited protocol and probably can't do zero-copy.

The most promising option I found was dm-userspace
(http://wiki.xensource.com/xenwiki/DmUserspace), which implements a
device-mapper target with an in-kernel MMU-like lookup mechanism that
calls out to userspace when block addresses need to be translated.
It's not anywhere near to upstream and hasn't been pushed for several
years.  On the plus side we could also write a userspace
implementation of this so that QEMU image formats continue to be
portable to other host OSes without duplicating code.

If tcm_vhost only works with raw images then I don't see it as a
realistic option given the effort it will require to complete and
maintain.

>> In order for QEMU userspace to support this, Linux would need to expose
>> a method to userspace for issuing DIF protected CDBs.  This userspace
>> API currently does not exist AFAIK, so a kernel-level approach is the
>> currently the only option when it comes to supporting end-to-end block
>> protection information originating from within Linux guests.
>
> I think it would be worthwhile to have this in userspace too.
>
>> (Note this is going to involve a virtio-scsi spec rev as well)
>
> Yes.  By the way, another possible modification could be to tell the
> guest what is its (initiator) WWPN.

Going back to ALUA, I'd like to understand ALUA multipathing a bit
better.  I've never played with multipath, hence my questions:

I have a SAN with multiple controllers and ALUA support - so ALUA
multipathing is possible.  Now I want my KVM guests to take advantage
of multipath themselves.  Since the LIO target virtualizes the SCSI
bus (the host admin defines LUNs, target ports, and ACLs that do not
have to map 1:1 to the SAN) we also have to implement ALUA in the
virtio-scsi target.  The same would be true for QEMU SCSI emulation.

How would we configure LIO's ALUA in this case?  We really want to
reflect the port attributes (available/offline,
optimized/non-optimized) that the external SAN fabric reports.  Is
this supported by LIO?

Does it even make sense to pass the multipathing up into the guest?
If we terminate it on the host using Linux's ALUA support, we can hide
multipath entirely from the guest.  Do we lose an obvious advantage by
terminating multipath in the host instead of guest?

Stefan
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html