On Fri, Aug 17, 2012 at 05:06:18PM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2012-08-15 at 14:59 -0300, Marcelo Tosatti wrote: > > > > The guest should not expect memory accesses to an address > > to behave sanely while changing a BAR anyway. > > > > Yes, compatibility for change of GPA base can be done in the > > kernel. I can look into it next week if nobody has done so at > > that point. > > There's one thing to be extra careful about here is if we start > doing that for normal memory (in case we start breaking it up > in slots, such as NUMA setups etc...). > > The problem is that we must not allow normal memory accesses to be > handled via the "emulation" code (ie MMIO emulation or load/store > emulation, whatever we call it). > > Part of the issues is that on architectures that don't use IPIs for > TLB invalidations but instead use some form of HW broadcast such as > PowerPC or ARM, there is an inherent race in that the emulation code can > keep a guest physical address (and perform the relevant access to the > corresponding memory region) way beyond the point where the guest > virtual->physical translation leading to that address has been > invalidated. > > This doesn't happen on x86 because essentially the completion of the > invalidation IPI has to wait for all VCPUs to "respond" and thus to > finish whatever emulation they are doing. This is not the case on archs > with a HW invalidate broadcast. > > This is a nasty race, and while we more/less decided that it was > survivable as long as we only go through emulation for devices (as we > don't play swapping games with them in the guest kernel), the minute we > allow normal guest memory access to "slip through", we have broken the > guest virtual memory model. This emulation is in hardware, yes? It is the lack of a TLB entry (or the lack of a valid pagetable to fill the TLB) that triggers it? > So if we are manipulated memory slots used for guest RAM we must -not- > break atomicity, since during the time the slot is gone, it will > fallback to emulation, introducing the above race (at least on PowerPC > and ARM). You could say get the vcpus to a known state (which has a side effect of making sure that emulation is stopped), no? (just as a mental exercise). > Cheers, > Ben. Yes. Well, Avi mentioned earlier that there are users for change of GPA base. But, if my understanding is correct, the code that emulates change of BAR in QEMU is: /* now do the real mapping */ if (r->addr != PCI_BAR_UNMAPPED) { memory_region_del_subregion(r->address_space, r->memory); } r->addr = new_addr; if (r->addr != PCI_BAR_UNMAPPED) { memory_region_add_subregion_overlap(r->address_space, r->addr, r->memory, 1); These translate to two kvm_set_user_memory ioctls. "> Without taking into consideration backwards compatibility, userspace > can first delete the slot and later create a new one. Current qemu will in fact do that. Not sure about older ones. " Avi, where it does that? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html