On Mon, May 31, 2021 at 07:31:57PM +0800, Liu Yi L wrote: > > > /* > > > * Get information about an I/O address space > > > * > > > * Supported capabilities: > > > * - VFIO type1 map/unmap; > > > * - pgtable/pasid_table binding > > > * - hardware nesting vs. software nesting; > > > * - ... > > > * > > > * Related attributes: > > > * - supported page sizes, reserved IOVA ranges (DMA mapping); > > > * - vendor pgtable formats (pgtable binding); > > > * - number of child IOASIDs (nesting); > > > * - ... > > > * > > > * Above information is available only after one or more devices are > > > * attached to the specified IOASID. Otherwise the IOASID is just a > > > * number w/o any capability or attribute. > > > > This feels wrong to learn most of these attributes of the IOASID after > > attaching to a device. > > but an IOASID is just a software handle before attached to a specific > device. e.g. before attaching to a device, we have no idea about the > supported page size in underlying iommu, coherent etc. The idea is you attach the device to the /dev/ioasid FD and this action is what crystalizes the iommu driver that is being used: device_fd = open("/dev/vfio/devices/dev1", mode); ioasid_fd = open("/dev/ioasid", mode); ioctl(device_fd, VFIO_BIND_IOASID_FD, ioasid_fd); After this sequence we should have most of the information about the IOMMU. One /dev/ioasid FD has one iommu driver. Design what an "iommu driver" means so that the system should only have one. Eg the coherent/not coherent distinction should not be a different "iommu driver". Device attach to the _IOASID_ is a different thing, and I think it puts the whole sequence out of order because we loose the option to customize the IOASID before it has to be realized into HW format. > > The user should have some idea how it intends to use the IOASID when > > it creates it and the rest of the system should match the intention. > > > > For instance if the user is creating a IOASID to cover the guest GPA > > with the intention of making children it should indicate this during > > alloc. > > > > If the user is intending to point a child IOASID to a guest page table > > in a certain descriptor format then it should indicate it during > > alloc. > > Actually, we have only two kinds of IOASIDs so far. Maybe at a very very high level, but it looks like there is alot of IOMMU specific configuration that goes into an IOASD. > > device bind should fail if the device somehow isn't compatible with > > the scheme the user is tring to use. > > yeah, I guess you mean to fail the device attach when the IOASID is a > nesting IOASID but the device is behind an iommu without nesting support. > right? Right.. > > > > > /* > > > * Map/unmap process virtual addresses to I/O virtual addresses. > > > * > > > * Provide VFIO type1 equivalent semantics. Start with the same > > > * restriction e.g. the unmap size should match those used in the > > > * original mapping call. > > > * > > > * If IOASID_REGISTER_MEMORY has been called, the mapped vaddr > > > * must be already in the preregistered list. > > > * > > > * Input parameters: > > > * - u32 ioasid; > > > * - refer to vfio_iommu_type1_dma_{un}map > > > * > > > * Return: 0 on success, -errno on failure. > > > */ > > > #define IOASID_MAP_DMA _IO(IOASID_TYPE, IOASID_BASE + 6) > > > #define IOASID_UNMAP_DMA _IO(IOASID_TYPE, IOASID_BASE + 7) > > > > What about nested IOASIDs? > > at first glance, it looks like we should prevent the MAP/UNMAP usage on > nested IOASIDs. At least hardware nested translation only allows MAP/UNMAP > on the parent IOASIDs and page table bind on nested IOASIDs. But considering > about software nesting, it seems still useful to allow MAP/UNMAP usage > on nested IOASIDs. This is how I understand it, how about your opinion > on it? do you think it's better to allow MAP/UNMAP usage only on parent > IOASIDs as a start? If the only form of nested IOASID is the "read the page table from my process memory" then MAP/UNMAP won't make sense on that.. MAP/UNMAP is only useful if the page table is stored in kernel memory. > > > #define IOASID_CREATE_NESTING _IO(IOASID_TYPE, IOASID_BASE + 8) > > > > Do you think another ioctl is best? Should this just be another > > parameter to alloc? > > either is fine. This ioctl is following one of your previous comment. Sometimes I say things in a way that is ment to be easier to understand conecpts not necessarily good API design :) > > > #define IOASID_BIND_PGTABLE _IO(IOASID_TYPE, IOASID_BASE + 9) > > > #define IOASID_UNBIND_PGTABLE _IO(IOASID_TYPE, IOASID_BASE + 10) > > > > Also feels backwards, why wouldn't we specify this, and the required > > page table format, during alloc time? > > here the model is user-space gets the page table format from kernel and > decide if it can proceed. So what you are suggesting is user-space should > tell kernel the page table format it has in ALLOC and kenrel should fail > the ALLOC if the user-space page table format is not compatible with underlying > iommu? Yes, the action should be Alloc an IOASID that points at a page table in this user memory, that is stored in this specific format. The supported formats should be discoverable after VFIO_BIND_IOASID_FD > > > /* > > > * Page fault report and response > > > * > > > * This is TBD. Can be added after other parts are cleared up. Likely it > > > * will be a ring buffer shared between user/kernel, an eventfd to notify > > > * the user and an ioctl to complete the fault. > > > * > > > * The fault data is per I/O address space, i.e.: IOASID + faulting_addr > > > */ > > > > Any reason not to just use read()? > > a ring buffer may be mmap to user-space, thus reading fault data from kernel > would be faster. This is also how Eric's fault reporting is doing today. Okay, if it is performance sensitive.. mmap rings are just tricky beasts > > > * Bind a vfio_device to the specified IOASID fd > > > * > > > * Multiple vfio devices can be bound to a single ioasid_fd, but a single > > > * vfio device should not be bound to multiple ioasid_fd's. > > > * > > > * Input parameters: > > > * - ioasid_fd; > > > * > > > * Return: 0 on success, -errno on failure. > > > */ > > > #define VFIO_BIND_IOASID_FD _IO(VFIO_TYPE, VFIO_BASE + 22) > > > #define VFIO_UNBIND_IOASID_FD _IO(VFIO_TYPE, VFIO_BASE + 23) > > > > This is where it would make sense to have an output "device id" that > > allows /dev/ioasid to refer to this "device" by number in events and > > other related things. > > perhaps this is the device info Jean Philippe wants in page fault reporting > path? Yes, it is Jason