Re: [PATCH v2 8/8] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 3 Sep 2024 20:55:32 -0300

On Tue, Sep 03, 2024 at 09:00:32AM +0000, Mostafa Saleh wrote:
> On Mon, Sep 02, 2024 at 09:30:22PM -0300, Jason Gunthorpe wrote:
> > On Mon, Sep 02, 2024 at 09:57:45AM +0000, Mostafa Saleh wrote:
> > > > > 2) Is there a reason the UAPI is designed this way?
> > > > > The way I imagined this, is that userspace will pass the pointer to the CD
> > > > > (+ format) not the STE (or part of it).
> > > > 
> > > > Yes, we need more information from the STE than just that. EATS and
> > > > STALL for instance. And the cachability below. Who knows what else in
> > > > the future.
> > > 
> > > But for example if that was extended later, how can user space know
> > > which fields are allowed and which are not?
> > 
> > Changes the vSTE rules that require userspace being aware would have
> > to be signaled in the GET_INFO answer. This is the same process no
> > matter how you encode the STE bits in the structure.
>
> How?

--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -504,6 +504,11 @@ struct iommu_hw_info_vtd {
        __aligned_u64 ecap_reg;
 };
 
+enum {
+       /* The kernel understand field NEW in the STE */
+       IOMMU_HW_INFO_ARM_SMMUV3_VSTE_NEW = 1 << 0,
+};
+
 /**
  * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information
  *                                   (IOMMU_HW_INFO_TYPE_ARM_SMMUV3)
@@ -514,6 +519,7 @@ struct iommu_hw_info_vtd {
  * @iidr: Information about the implementation and implementer of ARM SMMU,
  *        and architecture version supported
  * @aidr: ARM SMMU architecture version
+ * @kernel_capabilities: Bitmask of IOMMU_HW_INFO_ARM_SMMUV3_*
  *
  * For the details of @idr, @iidr and @aidr, please refer to the chapters
  * from 6.3.1 to 6.3.6 in the SMMUv3 Spec.
@@ -535,6 +541,7 @@ struct iommu_hw_info_arm_smmuv3 {
        __u32 idr[6];
        __u32 iidr;
        __u32 aidr;
+       __u32 kernel_capabilities;
 };
 
 /**

For example. There are all sorts of rules about 0 filling and things
that make this work trivially for the userspace.

> And why changing that in the future is not a problem as
> sanitising IDRs?

Reporting a static kernel capability through GET_INFO output is
easier/saner than providing some kind of policy flags in the GET_INFO
input to specify how the sanitization should work.

> > This confirmation of kernel support would then be reflected in the
> > vIDRs to the VM and the VM could know to set the extended bits.
> > 
> > Otherwise setting an invalidate vSTE will fail the ioctl, the VMM can
> > log the event, generate an event and install an abort vSTE.
> > 
> > > > Overall this sort of direct transparency is how I prefer to see these
> > > > kinds of iommufd HW specific interfaces designed. From a lot of
> > > > experience here, arbitary marshall/unmarshall is often an
> > > > antipattern :)
> > > 
> > > Is there any documentation for the (proposed) SMMUv3 UAPI for IOMMUFD?
> > 
> > Just the comments in this series?
> 
> But this is a UAPI. How can userspace implement that if it has no
> documentation, and how can it be maintained if there is no clear
> interface with userspace with what is expected/returned...

I'm not sure what you are looking for here? I don't think an entire
tutorial on how to build a paravirtualized vSMMU is appropriate to
put in comments?

The behavior of the vSTE processing as a single feature should be
understandable, and I think it is from the comments and code. If it
isn't, lets improve that.

There is definitely a jump from knowing how these point items work to
knowing how to build a para virtualized vSMMU in your VMM. This is
likely a gap of thousands of lines of code in userspace :\

> But we have a different model, with virtio-iommu, it typically presents
> the device to the VM and on the backend it calls VFIO MAP/UNMAP.

I thought pkvm's model was also map/unmap - so it could suppor HW
without nesting?

> Although technically we can have virtio-iommu in the hypervisor (EL2),
> that is a lot of complexit and increase in the TCB of pKVM.

That is too bad, it would be nice to not have to do everything new
from scratch to just get to the same outcome. :(

> I haven’t been keeping up with iommufd lately, I will try to spend more
> time on that in the future.
> But my idea is that we would create an IOMMUFD, attach it to a device and then
> through some extra IOCTLs, we can configure some “virtual” topology for it which
> then relies on KVM, again this is very early, and we need to support pKVM IOMMUs
> in the host first (I plan to send v2 RFC soon for that)

Most likely your needs here will match the needs of the confidential
compute people which are basically doing that same stuf. The way pKVM
wants to operate looks really similar to me to how the confidential
compute stuff wants to work where the VMM is untrusted and operations
are delegated to some kind of secure world.

So, for instance, AMD recently posted patches about how they would
create vPCI devices in their secure world, and there are various
things in the works for secure IOMMUs and so forth all with the
intention of not trusting the VMM, or permitting the VMM to compromise
the VM.

I would *really* like everyone to sit down and figure out how to
manage virtual device lifecycle in a single language!

> > I haven't heard if someone is going to KVM forum to talk about
> > vSMMUv3? Eric? Nicolin do you know?
> 
> I see, I won’t be in KVM forum, but I plan to attend LPC, we can discuss
> further there if people are interested.

Sure, definately, look forward to meeting you!

Jason