Hi Kevin, On 8/5/21 2:36 AM, Tian, Kevin wrote: >> From: Eric Auger <eric.auger@xxxxxxxxxx> >> Sent: Wednesday, August 4, 2021 11:59 PM >> > [...] >>> 1.2. Attach Device to I/O address space >>> +++++++++++++++++++++++++++++++++++++++ >>> >>> Device attach/bind is initiated through passthrough framework uAPI. >>> >>> Device attaching is allowed only after a device is successfully bound to >>> the IOMMU fd. User should provide a device cookie when binding the >>> device through VFIO uAPI. This cookie is used when the user queries >>> device capability/format, issues per-device iotlb invalidation and >>> receives per-device I/O page fault data via IOMMU fd. >>> >>> Successful binding puts the device into a security context which isolates >>> its DMA from the rest system. VFIO should not allow user to access the >> s/from the rest system/from the rest of the system >>> device before binding is completed. Similarly, VFIO should prevent the >>> user from unbinding the device before user access is withdrawn. >> With Intel scalable IOV, I understand you could assign an RID/PASID to >> one VM and another one to another VM (which is not the case for ARM). Is >> it a targetted use case?How would it be handled? Is it related to the >> sub-groups evoked hereafter? > Not related to sub-group. Each mdev is bound to the IOMMU fd respectively > with the defPASID which represents the mdev. But how does it work in term of security. The device (RID) is bound to an IOMMU fd. But then each SID/PASID may be working for a different VM. How do you detect this is safe as each SID can work safely for a different VM versus the ARM case where it is not possible. 1.3 says " 1) A successful binding call for the first device in the group creates the security context for the entire group, by: " What does it mean for above scalable IOV use case? > >> Actually all devices bound to an IOMMU fd should have the same parent >> I/O address space or root address space, am I correct? If so, maybe add >> this comment explicitly? > in most cases yes but it's not mandatory. multiple roots are allowed > (e.g. with vIOMMU but no nesting). OK, right, this corresponds to example 4.2 for example. I misinterpreted the notion of security context. The security context does not match the IOMMU fd but is something implicit created on 1st device binding. > > [...] >>> The device in the /dev/iommu context always refers to a physical one >>> (pdev) which is identifiable via RID. Physically each pdev can support >>> one default I/O address space (routed via RID) and optionally multiple >>> non-default I/O address spaces (via RID+PASID). >>> >>> The device in VFIO context is a logic concept, being either a physical >>> device (pdev) or mediated device (mdev or subdev). Each vfio device >>> is represented by RID+cookie in IOMMU fd. User is allowed to create >>> one default I/O address space (routed by vRID from user p.o.v) per >>> each vfio_device. >> The concept of default address space is not fully clear for me. I >> currently understand this is a >> root address space (not nesting). Is that coorect.This may need >> clarification. > w/o PASID there is only one address space (either GPA or GIOVA) > per device. This one is called default. whether it's root is orthogonal > (e.g. GIOVA could be also nested) to the device view of this space. > > w/ PASID additional address spaces can be targeted by the device. > those are called non-default. > > I could also rename default to RID address space and non-default to > RID+PASID address space if doing so makes it clearer. Yes I think it is worth having a kind of glossary and defining root as, default as as you clearly defined child/parent. > >>> VFIO decides the routing information for this default >>> space based on device type: >>> >>> 1) pdev, routed via RID; >>> >>> 2) mdev/subdev with IOMMU-enforced DMA isolation, routed via >>> the parent's RID plus the PASID marking this mdev; >>> >>> 3) a purely sw-mediated device (sw mdev), no routing required i.e. no >>> need to install the I/O page table in the IOMMU. sw mdev just uses >>> the metadata to assist its internal DMA isolation logic on top of >>> the parent's IOMMU page table; >> Maybe you should introduce this concept of SW mediated device earlier >> because it seems to special case the way the attach behaves. I am >> especially refering to >> >> "Successful attaching activates an I/O address space in the IOMMU, if the >> device is not purely software mediated" > makes sense. > >>> In addition, VFIO may allow user to create additional I/O address spaces >>> on a vfio_device based on the hardware capability. In such case the user >>> has its own view of the virtual routing information (vPASID) when marking >>> these non-default address spaces. >> I do not catch what does mean "marking these non default address space". > as explained above, those non-default address spaces are identified/routed > via PASID. > >>> 1.3. Group isolation >>> ++++++++++++++++++++ > [...] >>> 1) A successful binding call for the first device in the group creates >>> the security context for the entire group, by: >>> >>> * Verifying group viability in a similar way as VFIO does; >>> >>> * Calling IOMMU-API to move the group into a block-dma state, >>> which makes all devices in the group attached to an block-dma >>> domain with an empty I/O page table; >> this block-dma state/domain would deserve to be better defined (I know >> you already evoked it in 1.1 with the dma mapping protocol though) >> activates an empty I/O page table in the IOMMU (if the device is not >> purely SW mediated)? > sure. some explanations are scattered in following paragraph, but I > can consider to further clarify it. > >> How does that relate to the default address space? Is it the same? > different. this block-dma domain doesn't hold any valid mapping. The > default address space is represented by a normal unmanaged domain. > the ioasid attaching operation will detach the device from the block-dma > domain and then attach it to the target ioasid. OK Thanks Eric > >>> 2. uAPI Proposal >>> ---------------------- > [...] >>> /* >>> * Allocate an IOASID. >>> * >>> * IOASID is the FD-local software handle representing an I/O address >>> * space. Each IOASID is associated with a single I/O page table. User >>> * must call this ioctl to get an IOASID for every I/O address space that is >>> * intended to be tracked by the kernel. >>> * >>> * User needs to specify the attributes of the IOASID and associated >>> * I/O page table format information according to one or multiple devices >>> * which will be attached to this IOASID right after. The I/O page table >>> * is activated in the IOMMU when it's attached by a device. Incompatible >> .. if not SW mediated >>> * format between device and IOASID will lead to attaching failure. >>> * >>> * The root IOASID should always have a kernel-managed I/O page >>> * table for safety. Locked page accounting is also conducted on the root. >> The definition of root IOASID is not easily found in this spec. Maybe >> this would deserve some clarification. > make sense. > > and thanks for other typo-related comments. > > Thanks > Kevin