There is a potential race condition between hypervisor page faults and flushing a memslot. It is possible for a page fault to read the memslot before a memslot is updated and then write a PTE to the partition-scoped page tables after kvmppc_radix_flush_memslot has completed. (Note that this race has never been explicitly observed.) To close this race, it is sufficient to increment the MMU sequence number while the kvm->mmu_lock is held. That will cause mmu_notifier_retry() to return true, and the page fault will then return to the guest without inserting a PTE. Signed-off-by: Paul Mackerras <paulus@xxxxxxxxxx> --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index bc3f795..aa41183 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -1130,6 +1130,11 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm, kvm->arch.lpid); gpa += PAGE_SIZE; } + /* + * Increase the mmu notifier sequence number to prevent any page + * fault that read the memslot earlier from writing a PTE. + */ + kvm->mmu_notifier_seq++; spin_unlock(&kvm->mmu_lock); } -- 2.7.4