On Thu, Sep 29, 2022, Emanuele Giuseppe Esposito wrote: > > Am 28/09/2022 um 22:41 schrieb Sean Christopherson: > > Beyond that, there's no explanation of why this exact API is necessary, i.e. there > > are no requirements given. > > > > - Why can't this be solved in userspace? > > Because this would provide the "feature" only to QEMU, leaving each > other hypervisor to implement its own. But there's no evidence that any other VMM actually needs this feature. > > - When is KVM required to invalidate and flush? E.g. if a memslot is deleted > > and recreated with a different HVA, how does KVM ensure that there are no > > outstanding references to the old HVA without introducing non-determinstic > > behavior. The current API works by forcing userspace to fully delete the > > memslot, i.e. KVM can ensure all references are gone in all TLBs before > > allowing userspace to create new, conflicting entries. I don't see how this > > can work with batched updates. The update needs to be "atomic", i.e. vCPUs > > must never see an invalid/deleted memslot, but if the memslot is writable, > > how does KVM prevent some writes from hitting the old HVA and some from hitting > > the new HVA without a quiescent period? > > Sorry this might be my fault in providing definitions, even though I > think I made plenty of examples. Delete/move operations are not really > "atomic" in the sense that the slot disappears immediately. > > The slot(s) is/are first "atomically" invalidated, allowing the guest to > retry the page fault I completely forgot that KVM x86 now retries on KVM_MEMSLOT_INVALID[*]. That is somewhat arbitrary behavior, e.g. if x86 didn't do MMIO SPTE caching it wouldn't be "necessary" and x86 would still treat INVALID as MMIO. It's also x86 specific, no other architecture retries on invalid memslots, i.e. relying on that behavior to provide "atomicity" won't work without updating all other architectures. That's obviously doable, but since these updates aren't actually atomic, I don't see any meaningful benefit over adding e.g. KVM_MEM_DISABLED. [*] e0c378684b65 ("KVM: x86/mmu: Retry page faults that hit an invalid memslot") > > - If a memslot is truncated while dirty logging is enabled, what happens to > > the bitmap? Is it preserved? Dropped? > > Can you explain what you mean with "truncated memslot"? Shrink the size of the memslot. > Regarding the bitmap, currently QEMU should (probably wrongly) update it > before even committing the changes to KVM. But I remember upstream > someone pointed that this could be solved later. These details can't be punted, as they affect the ABI. My interpretation of "atomic" memslot updates was that KVM would truly honor the promise of atomic updates, e.g. that a DELETE=>CREATE sequence would effectively allow truncating a memslot without losing any data. Those details aren't really something that can be punted down the road to be fixed at a later point because they very much matter from an ABI perspective. > > Again, I completely agree that the current memslots API is far from perfect, but > > I'm not convinced that simply extending the existing API to batch updates is the > > best solution from a KVM perspective. > > > >>> E.g. why do a batch update and not provide KVM_SET_ALL_USER_MEMORY_REGIONS to > >>> do wholesale replacement? That seems like it would be vastly simpler to handle > >>> on KVM's end. Or maybe there's a solution in the opposite direction, e.g. an > >>> API that allows 1->N or N->1 conversions but not arbitrary batching. > >> > >> Wholesale replacement was my first idea when I looked at the issue, I think > >> at the end of 2020. I never got to a full implementation, but my impression > >> was that allocating/deallocating dirty bitmaps, rmaps etc. would make it any > >> easier than arbitrary batch updates. > > > > It's not obvious to me that the memslot metadata is going to be easy to handle > > regardless of what we do. E.g. I'm pretty sure that batching updates will "corrupt" > > the dirty bitmap if a hole is punched in a memslot that's being dirty logged. > > > > Could you provide an example? > I mean I am not an expert but to me it looks like I preserved the same > old functionalities both here and in QEMU. I just batched and delayed > some operations, which in this case should cause no harm. As above, my confusion is due to calling these updates "atomic". The new API is really just "allow batching memslot updates and subtly rely on restarting page faults on KVM_MEMSLOT_INVALID". IMO, KVM_MEM_DISABLED or similar is the way to go. I.e. formalize the "restart page faults" semantics without taking on the complexity of batched updates.