On Wed, 9 Nov 2016 20:23:03 +0100 Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote: > On Wed, Nov 09, 2016 at 01:59:07PM -0500, Don Dutile wrote: > > On 11/09/2016 12:03 PM, Will Deacon wrote: > > >On Tue, Nov 08, 2016 at 09:52:33PM -0500, Don Dutile wrote: > > >>On 11/08/2016 06:35 PM, Alex Williamson wrote: > > >>>On Tue, 8 Nov 2016 21:29:22 +0100 > > >>>Christoffer Dall <christoffer.dall@xxxxxxxxxx> wrote: > > >>>>Is my understanding correct, that you need to tell userspace about the > > >>>>location of the doorbell (in the IOVA space) in case (2), because even > > >>>>though the configuration of the device is handled by the (host) kernel > > >>>>through trapping of the BARs, we have to avoid the VFIO user programming > > >>>>the device to create other DMA transactions to this particular address, > > >>>>since that will obviously conflict and either not produce the desired > > >>>>DMA transactions or result in unintended weird interrupts? > > > > > >Yes, that's the crux of the issue. > > > > > >>>Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then > > >>>it's potentially a DMA target and we'll get bogus data on DMA read from > > >>>the device, and lose data and potentially trigger spurious interrupts on > > >>>DMA write from the device. Thanks, > > >>> > > >>That's b/c the MSI doorbells are not positioned *above* the SMMU, i.e., > > >>they address match before the SMMU checks are done. if > > >>all DMA addrs had to go through SMMU first, then the DMA access could > > >>be ignored/rejected. > > > > > >That's actually not true :( The SMMU can't generally distinguish between MSI > > >writes and DMA writes, so it would just see a write transaction to the > > >doorbell address, regardless of how it was generated by the endpoint. > > > > > >Will > > > > > So, we have real systems where MSI doorbells are placed at the same IOVA > > that could have memory for a guest > > I don't think this is a property of a hardware system. THe problem is > userspace not knowing where in the IOVA space the kernel is going to > place the doorbell, so you can end up (basically by chance) that some > IPA range of guest memory overlaps with the IOVA space for the doorbell. > > > >, but not at the same IOVA as memory on real hw ? > > On real hardware without an IOMMU the system designer would have to > separate the IOVA and RAM in the physical address space. With an IOMMU, > the SMMU driver just makes sure to allocate separate regions in the IOVA > space. > > The challenge, as I understand it, happens with the VM, because the VM > doesn't allocate the IOVA for the MSI doorbell itself, but the host > kernel does this, independently from the attributes (e.g. memory map) of > the VM. > > Because the IOVA is a single resource, but with two independent entities > allocating chunks of it (the host kernel for the MSI doorbell IOVA, and > the VFIO user for other DMA operations), you have to provide some > coordination between those to entities to avoid conflicts. In the case > of KVM, the two entities are the host kernel and the VFIO user (QEMU/the > VM), and the host kernel informs the VFIO user to never attempt to use > the doorbell IOVA already reserved by the host kernel for DMA. > > One way to do that is to ensure that the IPA space of the VFIO user > corresponding to the doorbell IOVA is simply not valid, ie. the reserved > regions that avoid for example QEMU to allocate RAM there. > > (I suppose it's technically possible to get around this issue by letting > QEMU place RAM wherever it wants but tell the guest to never use a > particular subset of its RAM for DMA, because that would conflict with > the doorbell IOVA or be seen as p2p transactions. But I think we all > probably agree that it's a disgusting idea.) Well, it's not like QEMU or libvirt stumbling through sysfs to figure out where holes could be in order to instantiate a VM with matching holes, just in case someone might decide to hot-add a device into the VM, at some point, and hopefully they don't migrate the VM to another host with a different layout first, is all that much less disgusting or foolproof. It's just that in order to dynamically remove a page as a possible DMA target we require a paravirt channel, such as a balloon driver that's able to pluck a specific page. In some ways it's actually less disgusting, but it puts some prerequisites on enlightening the guest OS. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html