On 21/09/17 07:27, Tian, Kevin wrote: >> From: Jean-Philippe Brucker >> Sent: Wednesday, September 6, 2017 7:55 PM >> >> Hi Kevin, >> >> On 28/08/17 08:39, Tian, Kevin wrote: >>> Here comes some comments: >>> >>> 1.1 Motivation >>> >>> You describe I/O page faults handling as future work. Seems you >> considered >>> only recoverable fault (since "aka. PCI PRI" being used). What about other >>> unrecoverable faults e.g. what to do if a virtual DMA request doesn't find >>> a valid mapping? Even when there is no PRI support, we need some basic >>> form of fault reporting mechanism to indicate such errors to guest. >> >> I am considering recoverable faults as the end goal, but reporting >> unrecoverable faults should use the same queue, with slightly different >> fields and no need for the driver to reply to the device. > > what about adding a placeholder for now? Though same mechanism > can be reused, it's an essential part to make virtio-iommu architecture > complete even before talking support for recoverable faults. :-) I'll see if I can come up with something simple for v0.5, but it seems like a big chunk of work. I don't really know what to report to the guest at the moment. I don't want to report vendor-specific details about the fault, but it should still be useful content, to let the guest decide whether they need to reset/kill the device or just print something [...] >> Yes I think adding MEM_T_IDENTITY will be necessary. I can see they are >> used for both iGPU and USB controllers on my x86 machines. Do you know >> more precisely what they are used for by the firmware? > > VTd spec has a clear description: > > 3.14 Handling Requests to Reserved System Memory > Reserved system memory regions are typically allocated by BIOS at boot > time and reported to OS as reserved address ranges in the system memory > map. Requests-without-PASID to these reserved regions may either occur > as a result of operations performed by the system software driver (for > example in the case of DMA from unified memory access (UMA) graphics > controllers to graphics reserved memory), or may be initiated by non > system software (for example in case of DMA performed by a USB > controller under BIOS SMM control for legacy keyboard emulation). > For proper functioning of these legacy reserved memory usages, when > system software enables DMA remapping, the second-level translation > structures for the respective devices are expected to be set up to provide > identity mapping for the specified reserved memory regions with read > and write permissions. > > (one specific example for GPU happens in legacy VGA usage in early > boot time before actual graphics driver is loaded) Thanks for the explanation. So it is only legacy, and enabling nested mode would be forbidden for a device with Reserved System Memory regions? I'm wondering if virtio-iommu RESV regions will be extended to affect a specific PASIDs (or all requests-with-PASID) in the future. >> It's not necessary with the base virtio-iommu device though (v0.4), >> because the device can create the identity mappings itself and report them >> to the guest as MEM_T_BYPASS. However, when we start handing page > > when you say "the device can create ...", I think you really meant > "host iommu driver can create identity mapping for assigned device", > correct? > > Then yes, I think above works. Yes it can be the host IOMMU driver, or simply Qemu sending VFIO ioctls to create those identity mappings (they are reported in sysfs reserved_regions). >> table >> control over to the guest, the host won't be in control of IOVA->GPA >> mappings and will need to gracefully ask the guest to do it. >> >> I'm not aware of any firmware description resembling Intel RMRR or AMD >> IVMD on ARM platforms. I do think ARM platforms could need >> MEM_T_IDENTITY >> for requesting the guest to map MSI windows when page-table handover is >> in >> use (MSI addresses are translated by the physical SMMU, so a IOVA->GPA >> mapping must be installed by the guest). But since a vSMMU would need a >> solution as well, I think I'll try to implement something more generic. > > curious do you need identity mapping full IOVA->GPA->HPA translation, > or just in GPA->HPA stage sufficient for above MSI scenario? It has to be IOVA->GPA->HPA. So it'll be a bit complicated to implement for us, I think we're going to need a VFIO ioctl to tell the host what IOVA the guest allocated for its MSI, but it's not ideal. Thanks, Jean