Alexey Kardashevskiy wrote: [..] > > I thought existing use cases assume that the CC-VM can trigger page > > conversions at will without regard to a vTOM concept? It would be nice > > to have that address-map separation arrangement, has not that ship > > already sailed? > > Mmm. I am either confusing you too much or not following you :) Any page > can be converted, the proposed arrangement would require that > convertion-candidate-pages are allocated from a specific pool? > > There are two vTOMs - one in IOMMU to decide on Cbit for DMA trafic (I > use this one), one in VMSA ("VIRTUAL_TOM") for guest memory (this > exercise is not using it). Which one do you mean? Dunno, you introduced the vTOM term. Suffice to say if any page can be converted in this model then I was confused. > > [..] > >>> Would the device not just launch in "shared" mode until it is later > >>> converted to private? I am missing the detail of why passing the device > >>> on the command line requires that private memory be mapped early. > >> > >> A sequencing problem. > >> > >> QEMU "realizes" a VFIO device, it creates an iommufd instance which > >> creates a domain and writes to a DTE (a IOMMU descriptor for PCI BDFn). > >> And DTE is not updated after than. For secure stuff, DTE needs to be > >> slightly different. So right then I tell IOMMUFD that it will handle > >> private memory. > >> > >> Then, the same VFIO "realize" handler maps the guest memory in iommufd. > >> I use the same flag (well, pointer to kvm) in the iommufd pinning code, > >> private memory is pinned and mapped (and related page state change > >> happens as the guest memory is made guest-owned in RMP). > >> > >> QEMU goes to machine_reset() and calls "SNP LAUNCH UPDATE" (the actual > >> place changed recenly, huh) and the latter will measure the guest and > >> try making all guest memory private but it already happened => error. > >> > >> I think I have to decouple the pinning and the IOMMU/DTE setting. > >> > >>> That said, the implication that private device assignment requires > >>> hotplug events is a useful property. This matches nicely with initial > >>> thoughts that device conversion events are violent and might as well be > >>> unplug/replug events to match all the assumptions around what needs to > >>> be updated. > >> > >> For the initial drop, I tell QEMU via "-device vfio-pci,x-tio=true" that > >> it is going to be private so there should be no massive conversion. > > > > That's a SEV-TIO RFC-specific hack, or a proposal? > > Not sure at the moment :) Ok, without more information it looks like a SEV-TIO shortcut. > > An approach that aligns more closely with the VFIO operational model, > > where it maps and waits for guest faults / usages, is that QEMU would be > > told that the device is "bind capable", because the host is not in a > > position to assume how the guest will use the device. A "bind capable" > > device operates in shared mode unless and until the guest triggers > > private conversion. > > True. I just started this exercise without QEMU DiscardManager. Now I > rely on it but it either needs to allow dynamic flip from > discarded==private to discarded==shared (should do for now) or allow 3 > states for guest pages. As we talked about on the KernelSIG call there is a potentially a guestmemfd proposal to handle in place conversion without a DiscardManager: https://lore.kernel.org/kvm/20240712232937.2861788-1-ackerleytng@xxxxxxxxxx/ [..] > > Per above, the tradeoff should be in ROI, not ugliness. I don't see how > > OVMF helps when devices might be being virtually hotplugged or reset. > > I have no clue how exactly hotplug works on x86, is not BIOS playing > role in it? Thanks, The hotplug controller can either be native PCIe or firmware managed. Likely we would pick the path of least of resistance for QEMU to facilitate device conversion.