On Thu, Apr 01, 2021 at 07:04:01AM +0000, Liu, Yi L wrote: > After reading your reply in https://lore.kernel.org/linux-iommu/20210331123801.GD1463678@xxxxxxxxxx/#t > So you mean /dev/ioasid FD is per-VM instead of per-ioasid, so above skeleton > doesn't suit your idea. You can do it one PASID per FD or multiple PASID's per FD. Most likely we will have high numbers of PASID's in a qemu process so I assume that number of FDs will start to be a contraining factor, thus multiplexing is reasonable. It doesn't really change anything about the basic flow. digging deeply into it either seems like a reasonable choice. > +-----------------------------+-----------------------------------------------+ > | userspace | kernel space | > +-----------------------------+-----------------------------------------------+ > | ioasid_fd = | /dev/ioasid does below: | > | open("/dev/ioasid", O_RDWR);| struct ioasid_fd_ctx { | > | | struct list_head ioasid_list; | > | | ... | > | | } ifd_ctx; // ifd_ctx is per ioasid_fd | Sure, possibly an xarray not a list > +-----------------------------+-----------------------------------------------+ > | ioctl(ioasid_fd, | /dev/ioasid does below: | > | ALLOC, &ioasid); | struct ioasid_data { | > | | ioasid_t ioasid; | > | | struct list_head device_list; | > | | struct list_head next; | > | | ... | > | | } id_data; // id_data is per ioasid | > | | | > | | list_add(&id_data.next, | > | | &ifd_ctx.ioasid_list); > | Yes, this should have a kref in it too > +-----------------------------+-----------------------------------------------+ > | ioctl(device_fd, | VFIO does below: | > | DEVICE_ALLOW_IOASID, | 1) get ioasid_fd, check if ioasid_fd is valid | > | ioasid_fd, | 2) check if ioasid is allocated from ioasid_fd| > | ioasid); | 3) register device/domain info to /dev/ioasid | > | | tracked in id_data.device_list | > | | 4) record the ioasid in VFIO's per-device | > | | ioasid list for future security check | You would provide a function that does steps 1&2 look at eventfd for instance. I'm not sure we need to register the device with the ioasid. device should incr the kref on the ioasid_data at this point. > +-----------------------------+-----------------------------------------------+ > | ioctl(ioasid_fd, | /dev/ioasid does below: | > | BIND_PGTBL, | 1) find ioasid's id_data | > | pgtbl_data, | 2) loop the id_data.device_list and tell iommu| > | ioasid); | give ioasid access to the devices > | This seems backwards, DEVICE_ALLOW_IOASID should tell the iommu to give the ioasid to the device. Here the ioctl should be about assigning a memory map from the the current mm_struct to the pasid > +-----------------------------+-----------------------------------------------+ > | ioctl(ioasid_fd, | /dev/ioasid does below: | > | UNBIND_PGTBL, | 1) find ioasid's id_data | > | ioasid); | 2) loop the id_data.device_list and tell iommu| > | | clear ioasid access to the devices | Also seems backwards. The ioctl here should be 'destroy ioasid' which wipes out the page table, halts DMA access and parks the PASID until all users are done. > +-----------------------------+-----------------------------------------------+ > | ioctl(device_fd, | VFIO does below: | > | DEVICE_DISALLOW_IOASID,| 1) check if ioasid is associated in VFIO's | > | ioasid_fd, | device ioasid list. | > | ioasid); | 2) unregister device/domain info from | > | | /dev/ioasid, clear in id_data.device_list | This should disconnect the iommu and kref_put the ioasid_data Remember the layering, only the device_fd knows what the pci_device is that it is touching, it doesn't make alot of sense to leak that into the ioasid world that should only be dealing with the page table mapping. > +-----------------------------+-----------------------------------------------+ > | ioctl(ioasid_fd, | /dev/ioasid does below: | > | FREE, ioasid); | list_del(&id_data.next); | > +-----------------------------+-----------------------------------------------+ Don't know if we need a free. The sequence above is backwards, the page table should be setup, the device authorized, device de-authorized then page table destroyed. PASID recycles once everyone is released. Include a sequence showing how the kvm FD is used to program the vPASID to pPASID table that ENQCMD uses. Show how dynamic authorization works based on requests from the guest's vIOMMU Jason