On Tue, Jul 26, 2022 at 07:34:55AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Monday, July 25, 2022 10:37 PM > > > > On Mon, Jul 25, 2022 at 07:38:52AM +0000, Tian, Kevin wrote: > > > > > > Yes. qemu has to select a static aperture at start. > > > > > > > > The entire aperture is best, if that fails > > > > > > > > A smaller aperture and hope the guest doesn't use the whole space, if > > > > that fails, > > > > > > > > The entire guest physical map and hope the guest is in PT mode > > > > > > That sounds a bit hacky... does it instead suggest that an interface > > > for reporting the supported ranges on a tracker could be helpful once > > > trying the entire aperture fails? > > > > It is the "try and fail" approach. It gives the driver the most > > flexability in processing the ranges to try and make them work. If we > > attempt to describe all the device constraints that might exist we > > will be here forever. > > Usually the caller of a 'try and fail' interface knows exactly what to > be tried and then call the interface to see whether the callee can > meet its requirement. Which is exactly this case. qemu has one thing to try that meets its full requirement - the entire vIOMMU aperture. The other two are possible options based on assumptions of how the guest VM is operating that might work - but this guessing is entirely between qemu and the VM, not something the kernel can help with. So, from the kernel perspective qemu will try three things in order of preference and the first to work will be the right one. Making the kernel API more complicated is not going to help qemu guess what the guest is doing any better. In any case this is vIOMMU mode so if the VM establishes mappings outside the tracked IOVA then qemu is aware of it and qemu can perma-dirty those pages as part of its migration logic. It is not broken, it just might not meet the SLA. > But I can see why a reporting mechanism doesn't fit well with > your example below. In the worst case probably the user has to > decide between using vIOMMU vs. vfio DMA logging if a simple > policy of using the entire aperture doesn't work... Well, yes, this is exactly the situation unfortunately. Without special HW support vIOMMU is not going to work perfectly, but there are reasonably use cases where vIOMMU is on but the guest is in PT mode that could work, or where the IOVA aperture is limited, or so on.. Jason