Hi Will, On 9/3/20 6:57 PM, Will Deacon wrote: > On Wed, Sep 02, 2020 at 05:23:08PM +0100, Alexandru Elisei wrote: >> On 8/25/20 10:39 AM, Will Deacon wrote: >>> Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead >>> of walking the page-table directly. >>> >>> Cc: Marc Zyngier <maz@xxxxxxxxxx> >>> Cc: Quentin Perret <qperret@xxxxxxxxxx> >>> Signed-off-by: Will Deacon <will@xxxxxxxxxx> >>> --- >>> arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++++++------------------- >>> 1 file changed, 32 insertions(+), 25 deletions(-) >>> >>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >>> index 704b471a48ce..751ce2462765 100644 >>> --- a/arch/arm64/kvm/mmu.c >>> +++ b/arch/arm64/kvm/mmu.c >>> @@ -39,6 +39,33 @@ static bool is_iomap(unsigned long flags) >>> return flags & KVM_S2PTE_FLAG_IS_IOMAP; >>> } >>> >>> +/* >>> + * Release kvm_mmu_lock periodically if the memory region is large. Otherwise, >>> + * we may see kernel panics with CONFIG_DETECT_HUNG_TASK, >>> + * CONFIG_LOCKUP_DETECTOR, CONFIG_LOCKDEP. Additionally, holding the lock too >>> + * long will also starve other vCPUs. We have to also make sure that the page >>> + * tables are not freed while we released the lock. >>> + */ >>> +#define stage2_apply_range(kvm, addr, end, fn, resched) \ >>> +({ \ >>> + int ret; \ >>> + struct kvm *__kvm = (kvm); \ >>> + bool __resched = (resched); \ >>> + u64 next, __addr = (addr), __end = (end); \ >>> + do { \ >>> + struct kvm_pgtable *pgt = __kvm->arch.mmu.pgt; \ >>> + if (!pgt) \ >>> + break; \ >> I'm 100% sure there's a reason why we've dropped the READ_ONCE, but it still looks >> to me like the compiler might decide to optimize by reading pgt once at the start >> of the loop and stashing it in a register. Would you mind explaining what I am >> missing? > The load always happens with the mmu_lock held, so I think it's not a > problem because it means that the pointer is stable. > spin_lock()/spin_unlock() imply compiler barriers. I think you are correct, if this is supposed to always execute with kvm->mmu_lock held, then pgt should not change between iterations. It didn't immediately occur to me that that is the case because we check if pgt is NULL every iteration. If we are relying on the lock being held, maybe we should move the pgt load + comparison against NULL out of the loop? That should avoid any confusion and make the code ever so slightly faster. Also, I see that in __unmap_stage2_range() we check that the mmu_lock is held, but we don't check that at all call sites (for example, in stage2_wp_range()). I realize this is me bikeshedding, but that looks a bit asymmetrical. Should we move the assert_spin_locked(&kvm->mmu_lock) statement in stage2_apply_range(), since the function assumes the pgt will remain unchanged? What do you think? Thanks, Alex _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm