New iotcls were introduced to pass information about guest stage1 to the host through VFIO. Let's document the nested stage control. Signed-off-by: Eric Auger <eric.auger@xxxxxxxxxx> --- fault reporting is current missing to the picture --- Documentation/vfio.txt | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt index f1a4d3c..858a363 100644 --- a/Documentation/vfio.txt +++ b/Documentation/vfio.txt @@ -239,6 +239,51 @@ group and can access them as follows:: /* Gratuitous device reset and go... */ ioctl(device, VFIO_DEVICE_RESET); +IOMMU Dual Stage Control +------------------------ + +Some IOMMUs support 2 stages of translation. This is useful when the +guest is exposed with a virtual IOMMU and some devices are assigned +to the guest through VFIO. Then the guest OS can use stage 1 (IOVA -> GPA), +while the hypervisor uses stage 2 for VM isolation (GPA -> HPA). + +The guest gets ownership of the stage 1 page tables and also owns stage 1 +configuration structures. The hypervisor owns the root configuration structure +(for security reason), including stage 2 configuration. This works as long +configuration structures and page table format are compatible between the +virtual IOMMU and the physical IOMMU. + +Assuming the HW supports it, this nested mode is selected by choosing the +VFIO_TYPE1_NESTING_IOMMU type through: + +ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU); + +This forces the hypervisor to use the stage 2, leaving stage 1 available for +guest usage. + +Once groups are attached to the container, the guest stage 1 translation +configuration data can be passed to VFIO by using + +ioctl(container, VFIO_IOMMU_BIND_GUEST_STAGE, &guest_stage_info); + +This allows to combine guest stage1 configuration structure along with hypervisor +stage 2 configuration structure. stage 1 configuration structures are dependent +on the IOMMU type. + +When the guest invalidates stage 1 entries, IOTLB invalidations must be forwarded +to the host through +ioctl(container, VFIO_IOMMU_TLB_INVALIDATE, &inv_data); +Those invalidations can happen at various granularity levels, page, context, ... + +The ARM SMMU specification introduces another challenge: MSIs are translated by +both the virtual SMMU and the physical SMMU. To build a nested mapping for the +IOVA programmed into the assigned device, the guest needs to pass its IOVA/MSI +doorbell GPA binding to the host. Then the hypervisor can build a nested stage 2 +binding eventually translating into the physical MSI doorbell. + +This is achieved by +ioctl(container, VFIO_IOMMU_BIND_MSI, &guest_binding); + VFIO User API ------------------------------------------------------------------------------- -- 2.5.5 _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm