RE: [RFC v3 02/25] hw/iommu: introduce DualStageIOMMUObject

"Liu, Yi L" <yi.l.liu@xxxxxxxxx> · Fri, 31 Jan 2020 11:42:06 +0000

Hi David,

> From: David Gibson [mailto:david@xxxxxxxxxxxxxxxxxxxxx]
> Sent: Friday, January 31, 2020 11:59 AM
> To: Liu, Yi L <yi.l.liu@xxxxxxxxx>
> Subject: Re: [RFC v3 02/25] hw/iommu: introduce DualStageIOMMUObject
> 
> On Wed, Jan 29, 2020 at 04:16:33AM -0800, Liu, Yi L wrote:
> > From: Liu Yi L <yi.l.liu@xxxxxxxxx>
> >
> > Currently, many platform vendors provide the capability of dual stage
> > DMA address translation in hardware. For example, nested translation
> > on Intel VT-d scalable mode, nested stage translation on ARM SMMUv3,
> > and etc. In dual stage DMA address translation, there are two stages
> > address translation, stage-1 (a.k.a first-level) and stage-2 (a.k.a
> > second-level) translation structures. Stage-1 translation results are
> > also subjected to stage-2 translation structures. Take vSVA (Virtual
> > Shared Virtual Addressing) as an example, guest IOMMU driver owns
> > stage-1 translation structures (covers GVA->GPA translation), and host
> > IOMMU driver owns stage-2 translation structures (covers GPA->HPA
> > translation). VMM is responsible to bind stage-1 translation structures
> > to host, thus hardware could achieve GVA->GPA and then GPA->HPA
> > translation. For more background on SVA, refer the below links.
> >  - https://www.youtube.com/watch?v=Kq_nfGK5MwQ
> >  - https://events19.lfasiallc.com/wp-content/uploads/2017/11/\
> > Shared-Virtual-Memory-in-KVM_Yi-Liu.pdf
> >
> > As above, dual stage DMA translation offers two stage address mappings,
> > which could have better DMA address translation support for passthru
> > devices. This is also what vIOMMU developers are doing so far. Efforts
> > includes vSVA enabling from Yi Liu and SMMUv3 Nested Stage Setup from
> > Eric Auger.
> > https://www.spinics.net/lists/kvm/msg198556.html
> > https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02842.html
> >
> > Both efforts are aiming to expose a vIOMMU with dual stage hardware
> > backed. As so, QEMU needs to have an explicit object to stand for
> > the dual stage capability from hardware. Such object offers abstract
> > for the dual stage DMA translation related operations, like:
> >
> >  1) PASID allocation (allow host to intercept in PASID allocation)
> >  2) bind stage-1 translation structures to host
> >  3) propagate stage-1 cache invalidation to host
> >  4) DMA address translation fault (I/O page fault) servicing etc.
> >
> > This patch introduces DualStageIOMMUObject to stand for the hardware
> > dual stage DMA translation capability. PASID allocation/free are the
> > first operation included in it, in future, there will be more operations
> > like bind_stage1_pgtbl and invalidate_stage1_cache and etc.
> >
> > Cc: Kevin Tian <kevin.tian@xxxxxxxxx>
> > Cc: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx>
> > Cc: Peter Xu <peterx@xxxxxxxxxx>
> > Cc: Eric Auger <eric.auger@xxxxxxxxxx>
> > Cc: Yi Sun <yi.y.sun@xxxxxxxxxxxxxxx>
> > Cc: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx>
> > Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx>
> 
> Several overall queries about this:
> 
> 1) Since it's explicitly handling PASIDs, this seems a lot more
>    specific to SVM than the name suggests.  I'd suggest a rename.

It is not specific to SVM in future. We have efforts to move guest
IOVA support based on host IOMMU's dual-stage DMA translation
capability. Then, guest IOVA support will also re-use the methods
provided by this abstract layer. e.g. the bind_guest_pgtbl() and
flush_iommu_iotlb().

For the naming, how about HostIOMMUContext? This layer is to provide
explicit methods for setting up dual-stage DMA translation in host.

> 
> 2) Why are you hand rolling structures of pointers, rather than making
>    this a QOM class or interface and putting those things into methods?

Maybe the name is not proper. Although I named it as DualStageIOMMUObject,
it is actually a kind of abstract layer we discussed in previous email. I
think this is similar with VFIO_MAP/UNMAP. The difference is that VFIO_MAP/
UNMAP programs mappings to host iommu domain. While the newly added explicit
method is to link guest page table to host iommu domain. VFIO_MAP/UNMAP
is exposed to vIOMMU emulators via MemoryRegion layer. right? Maybe adding a
similar abstract layer is enough. Is adding QOM really necessary for this
case?

> 3) It's not really clear to me if this is for the case where both
>    stages of translation are visible to the guest, or only one of
>    them.

For this case, vIOMMU will only expose a single stage translation to VM.
e.g. Intel VT-d, vIOMMU exposes first-level translation to guest. Hardware
IOMMUs with the dual-stage translation capability lets guest own stage-1
translation structures and host owns the stage-2 translation structures.
VMM is responsible to bind guest's translation structures to host and
enable dual-stage translation. e.g. on Intel VT-d, config translation type
to be NESTED.

Take guest SVM as an example, guest iommu driver owns the gVA->gPA mappings,
which is treated as stage-1 translation from host point of view. Host itself
owns the gPA->hPPA translation and called stage-2 translation when dual-stage
translation is configured.

For guest IOVA, it is similar with guest SVM. Guest iommu driver owns the
gIOVA->gPA mappings, which is treated as stage-1 translation. Host owns the
gPA->hPA translation.

Regards,
Yi Liu