On Mon, May 15, 2017 at 02:36:58PM +0100, Suzuki K Poulose wrote: > On 15/05/17 11:00, Christoffer Dall wrote: > >Hi Suzuki, > > > >On Wed, May 03, 2017 at 03:17:52PM +0100, Suzuki K Poulose wrote: > >>We yield the kvm->mmu_lock occassionaly while performing an operation > >>(e.g, unmap or permission changes) on a large area of stage2 mappings. > >>However this could possibly cause another thread to clear and free up > >>the stage2 page tables while we were waiting for regaining the lock and > >>thus the original thread could end up in accessing memory that was > >>freed. This patch fixes the problem by making sure that the stage2 > >>pagetable is still valid after we regain the lock. The fact that > >>mmu_notifer->release() could be called twice (via __mmu_notifier_release > >>and mmu_notifier_unregsister) enhances the possibility of hitting > >>this race where there are two threads trying to unmap the entire guest > >>shadow pages. > >> > >>While at it, cleanup the redudant checks around cond_resched_lock in > >>stage2_wp_range(), as cond_resched_lock already does the same checks. > >> > >>Cc: Mark Rutland <mark.rutland@xxxxxxx> > >>Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> > >>Cc: andreyknvl@xxxxxxxxxx > >>Cc: Christoffer Dall <christoffer.dall@xxxxxxxxxx> > >>Cc: Marc Zyngier <marc.zyngier@xxxxxxx> > >>Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > >>Signed-off-by: Suzuki K Poulose <suzuki.poulose@xxxxxxx> > >>--- > >> arch/arm/kvm/mmu.c | 17 ++++++++++++----- > >> 1 file changed, 12 insertions(+), 5 deletions(-) > >> > >>diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > >>index 909a1a7..5b3e0db 100644 > >>--- a/arch/arm/kvm/mmu.c > >>+++ b/arch/arm/kvm/mmu.c > >>@@ -301,9 +301,14 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) > >> /* > >> * If the range is too large, release the kvm->mmu_lock > >> * to prevent starvation and lockup detector warnings. > >>+ * Make sure the page table is still active when we regain > >>+ * the lock. > >> */ > >>- if (next != end) > >>+ if (next != end) { > >> cond_resched_lock(&kvm->mmu_lock); > >>+ if (!READ_ONCE(kvm->arch.pgd)) > >>+ break; > >>+ } > > > >So I don't think this change is wrong, but I wonder if it's sufficient. > >For example, I can see that this function is called from > > > >stage2_unmsp_vm > > -> stage2_unmap_memslot > > -> unmap_stage2_range > > > >and > > > >kvm_arch_flush_shadow_memslot > > -> unmap_stage2_range > > > >which never check if the pgd pointer is valid, > > You are right. Those two callers do not check it. We could fix all of this by simply > moving the check to the beginning of the loop. > i.e, something like this : > > @@ -295,6 +295,12 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) > assert_spin_locked(&kvm->mmu_lock); > pgd = kvm->arch.pgd + stage2_pgd_index(addr); > do { > + /* > + * Make sure the page table is still active, as we could > + * another thread could have possibly freed the page table. > + */ > + if (!READ_ONCE(kvm->arch.pgd)) > + break; > next = stage2_pgd_addr_end(addr, end); > if (!stage2_pgd_none(*pgd)) > unmap_stage2_puds(kvm, pgd, addr, next); > > > > > >and finally, kvm_free_stage2_pgd also checks the pgd pointer outside of holding the > >kvm->mmu_lock so why is this not racy? > > This has been fixed by patch 1 in the series. So should be fine. > > > I can respin the patch with the changes if you are OK with it. > Yes, absolutely. I've already applied patch 1 so no need to include that in your respin. Thanks! -Christoffer