Re: [PATCH RFC 08/12] iommufd: IOCTLs for the io_pagetable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 29, 2022 at 09:54:42AM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 29, 2022 at 04:00:14PM +1000, David Gibson wrote:
> > > But I don't have a use case in mind? The simplified things I know
> > > about want to attach their devices then allocate valid IOVA, they
> > > don't really have a notion about what IOVA regions they are willing to
> > > accept, or necessarily do hotplug.
> > 
> > The obvious use case is qemu (or whatever) emulating a vIOMMU.  The
> > emulation code knows the IOVA windows that are expected of the vIOMMU
> > (because that's a property of the emulated platform), and requests
> > them of the host IOMMU.  If the host can supply that, you're good
> > (this doesn't necessarily mean the host windows match exactly, just
> > that the requested windows fit within the host windows).  If not,
> > you report an error.  This can be done at any point when the host
> > windows might change - so try to attach a device that can't support
> > the requested windows, and it will fail.  Attaching a device which
> > shrinks the windows, but still fits the requested windows within, and
> > you're still good to go.
> 
> We were just talking about this in another area - Alex said that qemu
> doesn't know the IOVA ranges? Is there some vIOMMU cases where it does?

Uh.. what?  We certainly know (or, rather, choose) the IOVA ranges for
ppc.  That is to say we set up the default IOVA ranges at machine
construction (those defaults have changed with machine version a
couple of times).  If the guest uses dynamic DMA windows we then
update those ranges based on the hypercalls, but at any point we know
what the IOVA windows are supposed to be.  I don't really see how x86
or anything else could not know the IOVA ranges.  Who else *could* set
the ranges when implementing a vIOMMU in TCG mode?

For the non-vIOMMU case then IOVA==GPA, so everything qemu knows about
the GPA space it also knows about the IOVA space.  Which, come to
think of it, means memory hotplug also complicates things.

> Even if yes, qemu is able to manage this on its own - it doesn't use
> the kernel IOVA allocator, so there is not a strong reason to tell the
> kernel what the narrowed ranges are.

I don't follow.  The problem for the qemu case here is if you hotplug
a device which narrows down the range to something smaller than the
guest expects.  If qemu has told the kernel the ranges it needs, that
can just fail (which is the best you can do).  If the kernel adds the
device but narrows the ranges, then you may have just put the guest
into a situation where the vIOMMU cannot do what the guest expects it
to.  If qemu can only query the windows, not specify them then it
won't know that adding a particular device will conflict with its
guest side requirements until after it's already added.  That could
mess up concurrent guest initiated map operations for existing devices
in the same guest side domain, so I don't think reversing the hotplug
after the problem is detected is enough.

> > > That is one possibility, yes. qemu seems to be using this to establish
> > > a clone ioas of an existing operational one which is another usage
> > > model.
> > 
> > Right, for qemu (or other hypervisors) the obvious choice would be to
> > create a "staging" IOAS where IOVA == GPA, then COPY that into the various
> > emulated bus IOASes.  For a userspace driver situation, I'm guessing
> > you'd map your relevant memory pool into an IOAS, then COPY to the
> > IOAS you need for whatever specific devices you're using.
> 
> qemu seems simpler, it juggled multiple containers so it literally
> just copies when it instantiates a new container and does a map in
> multi-container.

I don't follow you.  Are you talking about the vIOMMU or non vIOMMU
case?  In the vIOMMU case the different containers can be for
different guest side iommu domains with different guest-IOVA spaces,
so you can't just copy from one to another.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux