On 19/05/2015 15:32, Radim Krčmář wrote: > I'd prefer to decouple address spaces and slots. KVM_GET_DIRTY_LOG > could stay the same if we said that a slot can be in multiple address > spaces. Actually, my original plan was a third one. I wanted to have a fixed number of address spaces, where CPUs would use only the first ones (0 for non-x86, 0+1 for x86). Then I would let userspace pick a "dirty log address space" which, in the case of QEMU, would simply use ram_addr_t as the address. However, that doesn't work because when marking a page dirty you need either a (slots, gfn) or a (memslot, relative address in memslot) pair. Given the gfn that the VCPU has faulted on, it would be very expensive to find the corresponding gfn in the "dirty log address space". On the contrary, it's very easy if the VCPU can simply query its current memslots. In your proposal, there is the problem of where to do the overlap check. You cannot do it in the slots because it messes up userspace seriously. QEMU for example wants to use as few slots as possible and merges consecutive slots; but what is mergeable in one address space need not be mergeable in another. And if you do it in the address spaces, you have not solved the problem of multiple dirty bitmaps pointing for the same userspace address. BTW, a few more calls have to be converted to kvm_vcpu_ equivalents. I've now gone through all occurrences of "gfn_to_" and found that we have more invocations of gfn_to_page_many_atomic, gfn_to_pfn_atomic, gfn_to_pfn, and gfn_to_page to convert. Also, mark_page_dirty must be changed to kvm_vcpu_mark_page_dirty. > (Well, we could do that even now, by searching for slots that > differ only in address space id and pointing them to same dirty bitmap. > This even makes sense, which is a sign of lacking design :) I thought about this, but ultimately it sounded like premature optimization (and also might not even work due to the aforementioned problem with merging of adjacent slots). A possible optimization is to set a flag when no bits are set in the dirty bitmap, and skip the iteration. This is the common case for the SMM memory space for example. But it can be done later, and is an independent work. Keeping slots separate for different address spaces also makes the most sense in QEMU, because each address space has a separate MemoryListener that only knows about one address space. There is one log_sync callback per address space and no code has to know about all address spaces. > The main drawback is that forcing decoupling on existing IOCTLs would be > really awkward ... we'd need to have new API for address spaces; > there are two basic operations on an address space: > add and remove slot (needs: slot id, address space id) > which would give roughly the same functionality as this patch. > (To maximixe compatibility, there could be a default address space and a > slot flag that doesn't automatical > On top of this, we could allow hierarchical address spaces, so very similar > address spaces (like SMM) would be easier to set up. That would actually be really messy. :) The regular and SMM address spaces are not hierarchical. As soon as you put a PCI resource underneath SMRAM---which is exactly what happens for legacy VRAM at 0xa0000---they can be completely different. Note that QEMU can map legacy VRAM as a KVM memslot when using the VGA 320x200x256 color mode (this mapping is not correct from the VGA point of view, but it cannot be changed in QEMU without breaking migration). What I do dislike in the API is the 16-bit split of the slots field; but the alternative of defining KVM_SET_USER_MEMORY_REGION2 and KVM_GET_DIRTY_LOG2 ioctls is just as sad. If you prefer it that way, I can certainly do that. Is it okay for you if I separate the cleanup patches, and then repost the actual SMM series after the API discussions have settled? The cleanups should be easily reviewable, and they make some sense on their own. I'm currently autotesting them. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html