Hi Jason, > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Thursday, April 1, 2021 7:54 PM > > On Thu, Apr 01, 2021 at 07:04:01AM +0000, Liu, Yi L wrote: > > > After reading your reply in https://lore.kernel.org/linux- > iommu/20210331123801.GD1463678@xxxxxxxxxx/#t > > So you mean /dev/ioasid FD is per-VM instead of per-ioasid, so above > skeleton > > doesn't suit your idea. > > You can do it one PASID per FD or multiple PASID's per FD. Most likely > we will have high numbers of PASID's in a qemu process so I assume > that number of FDs will start to be a contraining factor, thus > multiplexing is reasonable. > > It doesn't really change anything about the basic flow. > > digging deeply into it either seems like a reasonable choice. > > > +-----------------------------+-----------------------------------------------+ > > | userspace | kernel space | > > +-----------------------------+-----------------------------------------------+ > > | ioasid_fd = | /dev/ioasid does below: | > > | open("/dev/ioasid", O_RDWR);| struct ioasid_fd_ctx { | > > | | struct list_head ioasid_list; | > > | | ... | > > | | } ifd_ctx; // ifd_ctx is per ioasid_fd | > > Sure, possibly an xarray not a list > > > +-----------------------------+-----------------------------------------------+ > > | ioctl(ioasid_fd, | /dev/ioasid does below: | > > | ALLOC, &ioasid); | struct ioasid_data { | > > | | ioasid_t ioasid; | > > | | struct list_head device_list; | > > | | struct list_head next; | > > | | ... | > > | | } id_data; // id_data is per ioasid | > > | | | > > | | list_add(&id_data.next, | > > | | &ifd_ctx.ioasid_list); > > | > > Yes, this should have a kref in it too > > > +-----------------------------+-----------------------------------------------+ > > | ioctl(device_fd, | VFIO does below: | > > | DEVICE_ALLOW_IOASID, | 1) get ioasid_fd, check if ioasid_fd is valid | > > | ioasid_fd, | 2) check if ioasid is allocated from ioasid_fd| > > | ioasid); | 3) register device/domain info to /dev/ioasid | > > | | tracked in id_data.device_list | > > | | 4) record the ioasid in VFIO's per-device | > > | | ioasid list for future security check | > > You would provide a function that does steps 1&2 look at eventfd for > instance. > > I'm not sure we need to register the device with the ioasid. device > should incr the kref on the ioasid_data at this point. > > > +-----------------------------+-----------------------------------------------+ > > | ioctl(ioasid_fd, | /dev/ioasid does below: | > > | BIND_PGTBL, | 1) find ioasid's id_data | > > | pgtbl_data, | 2) loop the id_data.device_list and tell iommu| > > | ioasid); | give ioasid access to the devices > > | > > This seems backwards, DEVICE_ALLOW_IOASID should tell the iommu to > give the ioasid to the device. > > Here the ioctl should be about assigning a memory map from the the > current > mm_struct to the pasid > > > +-----------------------------+-----------------------------------------------+ > > | ioctl(ioasid_fd, | /dev/ioasid does below: | > > | UNBIND_PGTBL, | 1) find ioasid's id_data | > > | ioasid); | 2) loop the id_data.device_list and tell iommu| > > | | clear ioasid access to the devices | > > Also seems backwards. The ioctl here should be 'destroy ioasid' which > wipes out the page table, halts DMA access and parks the PASID until > all users are done. > > > +-----------------------------+-----------------------------------------------+ > > | ioctl(device_fd, | VFIO does below: | > > | DEVICE_DISALLOW_IOASID,| 1) check if ioasid is associated in VFIO's | > > | ioasid_fd, | device ioasid list. | > > | ioasid); | 2) unregister device/domain info from | > > | | /dev/ioasid, clear in id_data.device_list | > > This should disconnect the iommu and kref_put the ioasid_data thanks for the comments, updated the skeleton a little bit, accepted your Xarray and kref suggestion. +-----------------------------+------------------------------------------------+ | userspace | kernel space | +-----------------------------+------------------------------------------------+ | ioasid_fd = | /dev/ioasid does below: | | open("/dev/ioasid", O_RDWR);| struct ioasid_fd_ctx { | | | struct xarray xa; | | | ... | | | } ifd_ctx; // ifd_ctx is per ioasid_fd | +-----------------------------+------------------------------------------------+ | ioctl(ioasid_fd, | /dev/ioasid does below: | | ALLOC, &ioasid); | struct ioasid_data { | | | ioasid_t ioasid; | | | refcount_t refs; | | | ... | | | } id_data; // id_data is per ioasid | | | | | | refcount_set(&id_data->refs, 1); | +-----------------------------+------------------------------------------------+ | ioctl(device_fd, | VFIO does below: | | DEVICE_ALLOW_IOASID, | 1) get ioasid_fd, check if ioasid_fd is valid | | ioasid_fd, | 2) check if ioasid is allocated from ioasid_fd | | ioasid); | 3) inr refcount on the ioasid | | | 4) tell iommu to give the ioasid to the device | | | by an iommu API. iommu driver needs to | | | store the ioasid/device info in a per | | | ioasid allow device list | | | 5) record the ioasid in VFIO's per-device | | | ioasid list for future security check | +-----------------------------+------------------------------------------------+ | ioctl(ioasid_fd, | /dev/ioasid does below: | | BIND_PGTBL, | 1) find ioasid's id_data | | pgtbl_data, | 2) call into iommu driver with ioasid, pgtbl | | ioasid); | data, iommu driver setup the PASID entry[1] | | | with the ioasid and the pgtbl_data | +-----------------------------+------------------------------------------------+ | ioctl(ioasid_fd, | /dev/ioasid does below: | | CAHCE_INVLD, | 1) find ioasid's id_data | | inv_data, | 2) call into iommu driver with ioasid, inv | | ioasid); | data, iommu driver invalidates cache | +-----------------------------+------------------------------------------------+ | ioctl(ioasid_fd, | /dev/ioasid does below: | | UNBIND_PGTBL, | 1) find ioasid's id_data | | ioasid); | 2) call into iommu driver with ioasid, iommu | | | driver destroy the PASID entry to block DMA | | | with this ioasid from device | +-----------------------------+------------------------------------------------+ | ioctl(device_fd, | VFIO does below: | | DEVICE_DISALLOW_IOASID,| 1) check if ioasid is associated in VFIO's | | ioasid_fd, | device ioasid list | | ioasid); | 2) tell iommu driver to clear the device from | | | its per-ioasid device allow list | | | 3) put refcount on the ioasid | +-----------------------------+------------------------------------------------+ | ioctl(ioasid_fd, | /dev/ioasid does below: | | FREE, ioasid); | list_del(&id_data.next); | +-----------------------------+------------------------------------------------+ [1] PASID entry is an entry in a per-device PASID table, this is where the page table pointer is stored. e.g. guest cr3 page table pointer. Setup PASID entry in a device's PASID table means the access is finally grant in IOMMU side. I kept FREE as it seems to be more symmetric since there is an ALLOC exposed to userspace. But yeah, I'm open with removing it all the same if it's really unnecessary per your opinion. Need your help again on an open. The major purpose of this series is to support vSVA for guest based on nested translation. And there is another usage case which is also based on nested translation but don't have an ioasid. And still, it needs the bind/unbind_pgtbl, cache_invalidation uAPI. It is gIOVA support. In this usage, page table is a guest IOVA page table, VMM needs to bind this page table to host and enabled nested translation, also needs to do cache invalidation when guest IOVA page table has changes. It's very similar with the page table bind of vSVA. Only difference is there is no ioasid in the gIOVA case. Instead, gIOVA case requires device information. But with regards to the uAPI reusing, need to fit gIOVA to /dev/ioasid model. As of now, I think it may require user space passes a device FD to the BIND/UNBIND_PGTBL and CAHCE_INVLD ioctl, then iommu driver can bind the gIOVA page table to a correct device. Not sure if it looks good. Do you have any suggestion on it? [...] > Include a sequence showing how the kvm FD is used to program the > vPASID to pPASID table that ENQCMD uses. > > Show how dynamic authorization works based on requests from the > guest's vIOMMU I'd like to see if the updated skeleton suits your idea first, then draw a more complete flow to show this. Regards, Yi Liu > Jason