Am 29/09/2022 um 23:39 schrieb Sean Christopherson: > If we really want to provide a better experience for userspace, why not provide > more primitives to address those specific use cases? E.g. fix KVM's historic wart > of disallowing toggling of KVM_MEM_READONLY, and then add one or more ioctls to: > > - Merge 2+ memory regions that are contiguous in GPA and HVA > - Split a memory region into 2+ memory regions > - Truncate a memory region > - Grow a memory region I looked again at the code and specifically the use case that triggers the crash in bugzilla. I *think* (correct me if I am wrong), that the only operation that QEMU performs right now is "grow/shrink". So *if* we want to go this way, we could start with a simple grow/shrink API. Even though we need to consider that this could bring additional complexity in QEMU. Currently, DELETE+CREATE (grow/shrink) is not performed one after the other (in my innocent fantasy I was expecting to find 2 subsequent ioctls in the code), but in QEMU's address_space_set_flatview(), which seems to first remove all regions and then add them when changing flatviews. address_space_update_topology_pass(as, old_view2, new_view, adding=false); address_space_update_topology_pass(as, old_view2, new_view, adding=true); I don't think we can change this, as other listeners also rely on such ordering, but we can still batch all callback requests like I currently do and process them in kvm_commit(), figuring there which operation is which. In other words, we would have something like: region_del() --> DELETE memslot X -> add it to the queue of operations region_del() --> DELETE memslot Y -> add it to the queue of operations region_add() --> CREATE memslot X (size doubled) -> add it to the queue of operations region_add() --> CREATE memslot Y (size halved) -> add it to the queue of operations ... commit() --> scan QUEUE and figure what to do -> GROW X (+size), SHRINK Y (-size) -> issue 2 ioctls, GROW and SHRINK. > That would probably require more KVM code overall, but each operation would be more > tightly bounded and thus simpler to define. And I think more precise APIs would > provide other benefits, e.g. growing a region wouldn't require first deleting the > current region, and so could avoid zapping SPTEs and destroying metadata. Merge, > split, and truncate would have similar benefits, though they might be more > difficult to realize in practice. So essentially grow would not require INVALIDATE. Makes sense, but would it work also with shrink? I guess so, as the memslot is still present (but shrinked) right? Paolo, would you be ok with this smaller API? Probably just starting with grow and shrink first. I am not against any of the two approaches: - my approach has the disadvantage that the list could be arbitrarily long, and it is difficult to rollback the intermediate changes if something breaks during the request processing (but could be simplified by making kvm exit or crash). - Sean approach could potentially provide more burden to the userspace, as we need to figure which operation is which. Also from my understanding split and merge won't be really straightforward to implement, especially in userspace. David, any concern from userspace prospective on this "CISC" approach? Thank you, Emanuele