On Mon, 24 Feb 2020 19:49:53 +0100 Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: > On Mon, 24 Feb 2020 14:33:14 +1100 > David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > On Fri, Feb 21, 2020 at 07:07:02PM +0100, Halil Pasic wrote: > > > On Fri, 21 Feb 2020 10:48:15 -0500 > > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > > > > > > > On Fri, Feb 21, 2020 at 02:06:39PM +0100, Halil Pasic wrote: > > > > > On Fri, 21 Feb 2020 14:27:27 +1100 > > > > > David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote: > > > > > > > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote: > > > > > > > > >From a users perspective it makes absolutely perfect sense to use the > > > > > > > > bounce buffers when they are NEEDED. > > > > > > > > Forcing the user to specify iommu_platform just because you need bounce buffers > > > > > > > > really feels wrong. And obviously we have a severe performance issue > > > > > > > > because of the indirections. > > > > > > > > > > > > > > The point is that the user should not have to specify iommu_platform. > > > > > > > We need to make sure any new hypervisor (especially one that might require > > > > > > > bounce buffering) always sets it, > > > > > > > > > > > > So, I have draft qemu patches which enable iommu_platform by default. > > > > > > But that's really because of other problems with !iommu_platform, not > > > > > > anything to do with bounce buffering or secure VMs. > > > > > > > > > > > > The thing is that the hypervisor *doesn't* require bounce buffering. > > > > > > In the POWER (and maybe s390 as well) models for Secure VMs, it's the > > > > > > *guest*'s choice to enter secure mode, so the hypervisor has no reason > > > > > > to know whether the guest needs bounce buffering. As far as the > > > > > > hypervisor and qemu are concerned that's a guest internal detail, it > > > > > > just expects to get addresses it can access whether those are GPAs > > > > > > (iommu_platform=off) or IOVAs (iommu_platform=on). > > > > > > > > > > I very much agree! > > > > > > > > > > > > > > > > > > as was a rather bogus legacy hack > > > > > > > > > > > > It was certainly a bad idea, but it was a bad idea that went into a > > > > > > public spec and has been widely deployed for many years. We can't > > > > > > just pretend it didn't happen and move on. > > > > > > > > > > > > Turning iommu_platform=on by default breaks old guests, some of which > > > > > > we still care about. We can't (automatically) do it only for guests > > > > > > that need bounce buffering, because the hypervisor doesn't know that > > > > > > ahead of time. We could default to iommu_platform=on on s390 when the host has active support for protected virtualization... but that's just another kind of horrible, so let's just pretend I didn't suggest it. > > > > > > > > > > Turning iommu_platform=on for virtio-ccw makes no sense whatsover, > > > > > because for CCW I/O there is no such thing as IOMMU and the addresses > > > > > are always physical addresses. > > > > > > > > Fix the name then. The spec calls is ACCESS_PLATFORM now, which > > > > makes much more sense. > > > > > > I don't quite get it. Sorry. Maybe I will revisit this later. > > > > Halil, I think I can clarify this. > > > > The "iommu_platform" flag doesn't necessarily have anything to do with > > an iommu, although it often will. Basically it means "access guest > > memory via the bus's normal DMA mechanism" rather than "access guest > > memory using GPA, because you're the hypervisor and you can do that". > > > > Unfortunately, I don't think this is what is conveyed to the end users. > Let's see what do we have documented: > > Neither Qemu user documentation > (https://www.qemu.org/docs/master/qemu-doc.html) nor online help: > qemu-system-s390x -device virtio-net-ccw,?|grep iommu > iommu_platform=<bool> - on/off (default: false) > has any documentation on it. Now, that's 'helpful' -- this certainly calls out for a bit of doc... > > But libvirt does have have documenttion on the knob that contros > iommu_platform for QEMU (when QEMU is managed by libvirt): > """ > Virtio-related options > > QEMU's virtio devices have some attributes related to the virtio > transport under the driver element: The iommu attribute enables the use > of emulated IOMMU by the device. The attribute ats controls the Address > Translation Service support for PCIe devices. This is needed to make use > of IOTLB support (see IOMMU device). Possible values are on or off. > Since 3.5.0 > """ > (https://libvirt.org/formatdomain.html#elementsVirtio) > > Thus it seems the only available documentation says that it "enables the use > of emulated IOMMU by the device". > > And for vhost-user we have > """ > When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated: > > * Guest addresses map to the vhost memory region containing that guest > address. > > When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated: > > * Guest addresses are also called I/O virtual addresses (IOVAs). They are > translated to user addresses via the IOTLB. > """ > (docs/interop/vhost-user.rst) > > > For the case of ccw, both mechanisms end up being the same thing, > > since CCW's normal DMA *is* untranslated GPA access. > > > > Nod. > > > For this reason, the flag in the spec was renamed to ACCESS_PLATFORM, > > but the flag in qemu still has the old name. > > > > Yes, the name in the spec is more neutral. > > > AIUI, Michael is saying you could trivially change the name in qemu > > (obviously you'd need to alias the old name to the new one for > > compatibility). > > > > I could, and the I could also ask the libvirt guys to change <driver > iommu='X'> to <driver access_platform='X'> or similar and to change > their documentation to something that is harder to comprehend. Although > I'm not sure they would like the idea. Hopefully, the documentation can be changed to something that is _not_ harder to comprehend :) (with a bit more text, I suppose.) Renaming to something like access_platform seems like a good idea, even with the required compat dance. > > > > > Actually, the fact that ccw has no translation makes things easier for > > you: you don't really have any impediment to turning ACCESS_PLATFORM > > on by default, since it doesn't make any real change to how things > > work. > > Yeah, it should not, in theory, but currently it does in practice. > Currently vhost will try to do the IOTLB dance (Jason has a patch that > should help with that), and we get the 'use dma api' side effects in the > guest (e.g. virtqueue's data go <2G + some overhead). Nod. > > > > > The remaining difficulty is that the virtio driver - since it can sit > > on multiple buses - won't know this, and will reject the > > ACCESS_PLATFORM flag, even though it could just do what it normally > > does on ccw and it would work. > > Right ACCESS_PLATFORM is a funny feature where the device refuses to > work if the driver does not ack. > > > > > For that case, we could consider a hack in qemu where for virtio-ccw > > devices *only* we allow the guest to nack the ACCESS_PLATFORM flag and > > carry on anyway. Normally we insist that the guest accept the > > ACCESS_PLATFORM flag if offered, because on most platforms they > > *don't* amount to the same thing. > > Jason found a nice way to differentiate between needs translation and > does not need translation. But that patch still requires the ack by the > driver (and as Michael has pointed out we have to consider migration). > > I'm afraid that F_IOMMU_PLATFORM means different things in different > contexts, and that this ain't sufficiently documented. I'm tempted to do > a proper write-up on this (let's hope my motivation will and my time > will allow). I would also very much like to have Conny's opinion on this. More documentation is never a bad idea; but I'm afraid I don't have any further insights at the moment.
Attachment:
pgpw1WX6SGtiR.pgp
Description: OpenPGP digital signature