> From: David Gibson > Sent: Thursday, June 17, 2021 12:08 PM > > On Thu, Jun 03, 2021 at 08:12:27AM +0000, Tian, Kevin wrote: > > > From: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> > > > Sent: Wednesday, June 2, 2021 2:15 PM > > > > > [...] > > > > > > > > > > /* > > > > * Get information about an I/O address space > > > > * > > > > * Supported capabilities: > > > > * - VFIO type1 map/unmap; > > > > * - pgtable/pasid_table binding > > > > * - hardware nesting vs. software nesting; > > > > * - ... > > > > * > > > > * Related attributes: > > > > * - supported page sizes, reserved IOVA ranges (DMA > mapping); > > > > > > Can I request we represent this in terms of permitted IOVA ranges, > > > rather than reserved IOVA ranges. This works better with the "window" > > > model I have in mind for unifying the restrictions of the POWER IOMMU > > > with Type1 like mapping. > > > > Can you elaborate how permitted range work better here? > > Pretty much just that MAP operations would fail if they don't entirely > lie within a permitted range. So, for example if your IOMMU only > implements say, 45 bits of IOVA, then you'd have 0..0x1fffffffffff as > your only permitted range. If, like the POWER paravirtual IOMMU (in > defaut configuration) you have a small (1G) 32-bit range and a large > (45-bit) 64-bit range at a high address, you'd have say: > 0x00000000..0x3fffffff (32-bit range) > and > 0x800000000000000 .. 0x8001fffffffffff (64-bit range) > as your permitted ranges. > > If your IOMMU supports truly full 64-bit addressing, but has a > reserved range (for MSIs or whatever) at 0xaaaa000..0xbbbb0000 then > you'd have permitted ranges of 0..0xaaa9ffff and > 0xbbbb0000..0xffffffffffffffff. I see. Has incorporated this comment in v2. > > [snip] > > > For debugging and certain hypervisor edge cases it might be useful to > > > have a call to allow userspace to lookup and specific IOVA in a guest > > > managed pgtable. > > > > Since all the mapping metadata is from userspace, why would one > > rely on the kernel to provide such service? Or are you simply asking > > for some debugfs node to dump the I/O page table for a given > > IOASID? > > I'm thinking of this as a debugging aid so you can make sure that how > the kernel is interpreting that metadata in the same way that your > userspace expects it to interpret that metadata. > I'll not include it in this RFC. There are already too many stuff. The debugging aid can be added anyway when it's actually required. Thanks, Kevin