Hi Will, I'm answering my own question, again. See below. On 9/8/20 2:07 PM, Alexandru Elisei wrote: > Hi Will, > > On 9/3/20 6:57 PM, Will Deacon wrote: >> On Wed, Sep 02, 2020 at 05:23:08PM +0100, Alexandru Elisei wrote: >>> On 8/25/20 10:39 AM, Will Deacon wrote: >>>> Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead >>>> of walking the page-table directly. >>>> >>>> Cc: Marc Zyngier <maz@xxxxxxxxxx> >>>> Cc: Quentin Perret <qperret@xxxxxxxxxx> >>>> Signed-off-by: Will Deacon <will@xxxxxxxxxx> >>>> --- >>>> arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++++++------------------- >>>> 1 file changed, 32 insertions(+), 25 deletions(-) >>>> >>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >>>> index 704b471a48ce..751ce2462765 100644 >>>> --- a/arch/arm64/kvm/mmu.c >>>> +++ b/arch/arm64/kvm/mmu.c >>>> @@ -39,6 +39,33 @@ static bool is_iomap(unsigned long flags) >>>> return flags & KVM_S2PTE_FLAG_IS_IOMAP; >>>> } >>>> >>>> +/* >>>> + * Release kvm_mmu_lock periodically if the memory region is large. Otherwise, >>>> + * we may see kernel panics with CONFIG_DETECT_HUNG_TASK, >>>> + * CONFIG_LOCKUP_DETECTOR, CONFIG_LOCKDEP. Additionally, holding the lock too >>>> + * long will also starve other vCPUs. We have to also make sure that the page >>>> + * tables are not freed while we released the lock. >>>> + */ >>>> +#define stage2_apply_range(kvm, addr, end, fn, resched) \ >>>> +({ \ >>>> + int ret; \ >>>> + struct kvm *__kvm = (kvm); \ >>>> + bool __resched = (resched); \ >>>> + u64 next, __addr = (addr), __end = (end); \ >>>> + do { \ >>>> + struct kvm_pgtable *pgt = __kvm->arch.mmu.pgt; \ >>>> + if (!pgt) \ >>>> + break; \ >>> I'm 100% sure there's a reason why we've dropped the READ_ONCE, but it still looks >>> to me like the compiler might decide to optimize by reading pgt once at the start >>> of the loop and stashing it in a register. Would you mind explaining what I am >>> missing? >> The load always happens with the mmu_lock held, so I think it's not a >> problem because it means that the pointer is stable. >> spin_lock()/spin_unlock() imply compiler barriers. > I think you are correct, if this is supposed to always execute with kvm->mmu_lock > held, then pgt should not change between iterations. It didn't immediately occur > to me that that is the case because we check if pgt is NULL every iteration. If we > are relying on the lock being held, maybe we should move the pgt load + comparison > against NULL out of the loop? That should avoid any confusion and make the code > ever so slightly faster. > > Also, I see that in __unmap_stage2_range() we check that the mmu_lock is held, but > we don't check that at all call sites (for example, in stage2_wp_range()). I > realize this is me bikeshedding, but that looks a bit asymmetrical. Should we move > the assert_spin_locked(&kvm->mmu_lock) statement in stage2_apply_range(), since > the function assumes the pgt will remain unchanged? What do you think? What I wrote is wrong, because we can drop the lock in cond_resched_lock(). I don't see the need for any changes. Thanks, Alex _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm