Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> · Wed, 24 Mar 2021 12:05:28 -0700

Hi Jason,

On Mon, 22 Mar 2021 09:03:00 -0300, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Fri, Mar 19, 2021 at 11:22:21AM -0700, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Fri, 19 Mar 2021 10:54:32 -0300, Jason Gunthorpe <jgg@xxxxxxxxxx>
> > wrote: 
> > > On Fri, Mar 19, 2021 at 02:41:32PM +0100, Jean-Philippe Brucker
> > > wrote:  
> > > > On Fri, Mar 19, 2021 at 09:46:45AM -0300, Jason Gunthorpe wrote:    
> > > > > On Fri, Mar 19, 2021 at 10:58:41AM +0100, Jean-Philippe Brucker
> > > > > wrote: 
> > > > > > Although there is no use for it at the moment (only two upstream
> > > > > > users and it looks like amdkfd always uses current too), I quite
> > > > > > like the client-server model where the privileged process does
> > > > > > bind() and programs the hardware queue on behalf of the client
> > > > > > process.    
> > > > > 
> > > > > This creates a lot complexity, how do does process A get a secure
> > > > > reference to B? How does it access the memory in B to setup the
> > > > > HW?    
> > > > 
> > > > mm_access() for example, and passing addresses via IPC    
> > > 
> > > I'd rather the source process establish its own PASID and then pass
> > > the rights to use it to some other process via FD passing than try to
> > > go the other way. There are lots of security questions with something
> > > like mm_access.
> > >   
> > 
> > Thank you all for the input, it sounds like we are OK to remove mm
> > argument from iommu_sva_bind_device() and iommu_sva_alloc_pasid() for
> > now?
> > 
> > Let me try to summarize PASID allocation as below:
> > 
> > Interfaces	| Usage	|  Limit	| bind¹ |User visible
> > /dev/ioasid²	| G-SVA/IOVA	|  cgroup	| No
> > |Yes char dev³	| SVA		|  cgroup	|
> > Yes	|No iommu driver	| default PASID|  no
> > | No	|No kernel		| super SVA	| no
> > 	| yes   |No
> > 
> > ¹ Allocated during SVA bind
> > ² PASIDs allocated via /dev/ioasid are not bound to any mm. But its
> >   ownership is assigned to the process that does the allocation.  
> 
> What does "not bound to a mm" mean?
> 
I meant, the IOASID allocated via /dev/ioasid is in a clean state (just a
number). It's initial state is not bound to an mm. Unlike, sva_bind_device()
where the IOASID is allocated during bind time.

The use case is to support guest SVA bind, where allocation and bind are in
two separate steps.

> IMHO a use created PASID is either bound to a mm (current) at creation
> time, or it will never be bound to a mm and its page table is under
> user control via /dev/ioasid.
> 
True for PASID used in native SVA bind. But for binding with a guest mm,
PASID is allocated first (VT-d virtual cmd interface Spec 10.4.44), the
bind with the host IOMMU when vIOMMU PASID cache is invalidated.

Our intention is to have two separate interfaces:
1. /dev/ioasid (allocation/free only)
2. /dev/sva (handles all SVA related activities including page tables)

> I thought the whole point of something like a /dev/ioasid was to get
> away from each and every device creating its own PASID interface?
> 
yes, but only for the use cases that need to expose PASID to the userspace.
AFAICT, the cases are:
1. guest SVA (bind guest mm)
2. full PF/VF assignment(not mediated) where guest driver want to program
the actual PASID onto the device.

> It maybe somewhat reasonable that some devices could have some easy
> 'make a SVA PASID on current' interface built in,
I agree, this is the case PASID is hidden from the userspace, right? e.g.
uacce.

> but anything more
> complicated should use /dev/ioasid, and anything consuming PASID
> should also have an API to import and attach a PASID from /dev/ioasid.
> 
Would the above two use cases constitute the "complicated" criteria? Or we
should say anything that need the explicit PASID value has to through
/dev/ioasid?

Could you give some highlevel hint on the APIs that hook up IOASID
allocated from /dev/ioasid and use cases that combine device and domain
information? Yi is working on /dev/sva RFC, it would be good to have a
direction check.

> > Currently, the proposed /dev/ioasid interface does not map individual
> > PASID with an FD. The FD is at the ioasid_set granularity and bond to
> > the current mm. We could extend the IOCTLs to cover individual PASID-FD
> > passing case when use cases arise. Would this work?  
> 
> Is it a good idea that the FD is per ioasid_set ?
We were thinking the allocation IOCTL is on a per set basis, then we know
the ownership of between PASIDs and its set. If per PASID FD is needed, we
can extend.

> What is the set used
> for?
> 
I tried to document the concept in
https://lore.kernel.org/lkml/1614463286-97618-2-git-send-email-jacob.jun.pan@xxxxxxxxxxxxxxx/

In terms of usage for guest SVA, an ioasid_set is mostly tied to a host mm,
the use case is as the following:
1. Identify a pool of PASIDs for permission checking (below to the same VM),
e.g. only allow SVA binding for PASIDs allocated from the same set.

2. Allow different PASID-aware kernel subsystems to associate, e.g. KVM,
device drivers, and IOMMU driver. i.e. each KVM instance only cares about
the ioasid_set associated with the VM. Events notifications are also within
the ioasid_set to synchronize PASID states.

3. Guest-Host PASID look up (each set has its own XArray to store the
mapping)

4. Quota control (going away once we have cgroup)

> Usually kernel interfaces work nicer with a one fd/one object model.
> 
> But even if it is a set, you could pass the set between co-operating
> processes and the PASID can be created in the correct 'current'. But
> there is all kinds of security questsions as soon as you start doing
> anything like this - is there really a use case?
> 
We don't see a use case for passing ioasid_set to another process. All the
four use cases above are for the current process.

> Jason

Thanks,

Jacob