Re: [PATCH v8 7/9] vfio/mdev: Add iommu related member in mdev_device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 12, 2021 at 07:46:05AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Wednesday, May 12, 2021 1:37 AM
> > 
> > On Tue, May 11, 2021 at 02:56:05PM +0800, Lu Baolu wrote:
> > 
> > > >     After my next series the mdev drivers will have direct access to
> > > >     the vfio_device. So an alternative to using the struct device, or
> > > >     adding 'if mdev' is to add an API to the vfio_device world to
> > > >     inject what iommu configuration is needed from that direction
> > > >     instead of trying to discover it from a struct device.
> > >
> > > Just want to make sure that I understand you correctly.
> > >
> > > We should use the existing IOMMU in-kernel APIs to connect mdev with the
> > > iommu subsystem, so that the upper lays don't need to use something
> > > like (if dev_is_mdev) to handle mdev differently. Do I get you
> > > correctly?
> > 
> > After going through all the /dev/ioasid stuff I'm pretty convinced
> > that none of the PASID use cases for mdev should need any iommu
> > connection from the mdev_device - this is an artifact of trying to
> > cram the vfio container and group model into the mdev world and is not
> > good design.
> > 
> > The PASID interfaces for /dev/ioasid should use the 'struct
> > pci_device' for everything and never pass in a mdev_device to the
> > iommu layer.
> 
> 'struct pci_device' -> 'struct device' since /dev/ioasid also needs to support
> non-pci devices?

I don't know. PASID is a PCI concept, I half expect to have at least
some wrappers for PCI specific IOMMU APIs so there is reasonable type
safety possible. But maybe it is all general enough that isn't needed.

> I assume the so-called connection here implies using iommu_attach_device 
> to attach a device to an iommu domain. 

yes

> Did you suggest this connection must be done by the mdev driver
> which implements vfio_device and then passing iommu domain to
> /dev/ioasid when attaching the device to an IOASID? 

Why do we need iommu domain in a uAPI at all? It is an artifact of the
current kernel implementation

> sort of like: ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid, domain);

ioasid and device_fd completely describe the IOMMU parameters, don't
they?

> If yes, this conflicts with one design in /dev/ioasid proposal that we're
> working on. In earlier discussion we agreed that each ioasid is associated
> to a singleton iommu domain and all devices that are attached to this 
> ioasid with compatible iommu capabilities just share this domain.

I think you need to stand back a bit more from the current detailed
implementation of the iommu API. /dev/ioasid needs to be able to
create IOASID objects that can be used with as many devices as
reasonable without duplicating the page tables. If or how that maps to
todays iommu layer I don't know.

Remember the uAPI is forever, the kernel internals can change.

> Baolu and I discussed below draft proposal to avoid passing mdev_device
> to the iommu layer. Please check whether it makes sense:
> 
> // for every device attached to an ioasid
> // mdev is represented by pasid (allocated by mdev driver)
> // pf/vf has INVALID_IOASID in pasid
> struct dev_info {
> 	struct list_head		next;
> 	struct device		*device;
> 	u32			pasid;
> }

This is a list of "attachments"? sure

> // for every allocated ioasid
> struct ioasid_info {
> 	// the handle to convey iommu operations
> 	struct iommu_domain	*domain;
> 	// metadata for map/unmap
> 	struct rb_node		dma_list;
> 	// the list of attached device
> 	struct dev_info		*dev_list;
> 	...
> }

Yes, probably something basically like that
 
> // called by VFIO/VDPA
> int ioasid_attach_device(struct *device, u32 ioasid, u32 pasid)

'u32 ioasid' should be a 'struct ioasid_info *' and 'pasid' is not
needed because ioasif_info->dev_list[..]->pasid already stores the
value.

Keep in mind at this API level the 'struct device *' here shuld be the
PCI device never the mdev device.

> {
> 	// allocate a new dev_info, filled with device/pasid
> 	// allocate iommu domain if it's the 1st attached device
> 	// check iommu compatibility if an domain already exists
> 
> 	// attach the device to the iommu domain
> 	if (pasid == INVALID_IOASID)
> 		iommu_attach_device(domain, device);
> 	else
> 		iommu_aux_attach_device(domain, device, pasid);

And if device is the pci_device I don't really see how this works.

This API layer would need to create some dummy struct device to attach
the aux domain too if you want to keep re-using the stuff we have
today.

The idea that a PASID is 1:1 with a 'struct device' is completely VFIO
centric thinking.

> // when attaching PF/VF to an ioasid
> ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid);

This would have to be a (ioasif_fd, ioasid) tuple as an ioasid is
scoped within each fd.

> -> get vfio_device of device_fd
> -> ioasid_attach_device(vfio_device->dev, ioasid, INVALID_IOASID);

The device knows if it is going to use a PASID or not. The API level
here should always very explicitly indicate *device* intention.

If the device knows it is going to form DMA's with a PASID tag then it
absoultely must be completely explict and clear in the API to the
layers below that is what is happening.

'INVALID_IOASID' does not communicate that ideal to me.

> // when attaching a mdev to an ioasid
> ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid);
> -> get vfio_device of device_fd
> -> find mdev_parent of vfio_device
> -> find pasid allocated to this mdev
> -> ioasid_attach_device(parent->dev, ioasid, pasid);

Again no, mdev shouldn't be involved, it is the wrong layering.

Jason



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux