On Fri, Oct 01, 2021 at 04:19:22PM +1000, david@xxxxxxxxxxxxxxxxxxxxx wrote: > On Wed, Sep 22, 2021 at 11:09:11AM -0300, Jason Gunthorpe wrote: > > On Wed, Sep 22, 2021 at 03:40:25AM +0000, Tian, Kevin wrote: > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > > Sent: Wednesday, September 22, 2021 1:45 AM > > > > > > > > On Sun, Sep 19, 2021 at 02:38:39PM +0800, Liu Yi L wrote: > > > > > This patch adds IOASID allocation/free interface per iommufd. When > > > > > allocating an IOASID, userspace is expected to specify the type and > > > > > format information for the target I/O page table. > > > > > > > > > > This RFC supports only one type (IOMMU_IOASID_TYPE_KERNEL_TYPE1V2), > > > > > implying a kernel-managed I/O page table with vfio type1v2 mapping > > > > > semantics. For this type the user should specify the addr_width of > > > > > the I/O address space and whether the I/O page table is created in > > > > > an iommu enfore_snoop format. enforce_snoop must be true at this point, > > > > > as the false setting requires additional contract with KVM on handling > > > > > WBINVD emulation, which can be added later. > > > > > > > > > > Userspace is expected to call IOMMU_CHECK_EXTENSION (see next patch) > > > > > for what formats can be specified when allocating an IOASID. > > > > > > > > > > Open: > > > > > - Devices on PPC platform currently use a different iommu driver in vfio. > > > > > Per previous discussion they can also use vfio type1v2 as long as there > > > > > is a way to claim a specific iova range from a system-wide address space. > > > > > This requirement doesn't sound PPC specific, as addr_width for pci > > > > devices > > > > > can be also represented by a range [0, 2^addr_width-1]. This RFC hasn't > > > > > adopted this design yet. We hope to have formal alignment in v1 > > > > discussion > > > > > and then decide how to incorporate it in v2. > > > > > > > > I think the request was to include a start/end IO address hint when > > > > creating the ios. When the kernel creates it then it can return the > > > > > > is the hint single-range or could be multiple-ranges? > > > > David explained it here: > > > > https://lore.kernel.org/kvm/YMrKksUeNW%2FPEGPM@yekko/ > > Apparently not well enough. I've attempted again in this thread. > > > qeumu needs to be able to chooose if it gets the 32 bit range or 64 > > bit range. > > No. qemu needs to supply *both* the 32-bit and 64-bit range to its > guest, and therefore needs to request both from the host. As I understood your remarks each IOAS can only be one of the formats as they have a different PTE layout. So here I ment that qmeu needs to be able to pick *for each IOAS* which of the two formats it is. > Or rather, it *might* need to supply both. It will supply just the > 32-bit range by default, but the guest can request the 64-bit range > and/or remove and resize the 32-bit range via hypercall interfaces. > Vaguely recent Linux guests certainly will request the 64-bit range in > addition to the default 32-bit range. And this would result in two different IOAS objects Jason