On Wed, Apr 27, 2016 at 04:59:57PM +0200, Andrea Arcangeli wrote: > On Wed, Apr 27, 2016 at 04:50:30PM +0300, Kirill A. Shutemov wrote: > > I know nothing about kvm. How do you protect against pmd splitting between > > get_user_pages() and the check? > > get_user_pages_fast() runs fully lockless and unpins the page right > away (we need a get_user_pages_fast without the FOLL_GET in fact to > avoid a totally useless atomic_inc/dec!). > > Then we take a lock that is also taken by > mmu_notifier_invalidate_range_start. This way __split_huge_pmd will > block in mmu_notifier_invalidate_range_start if it tries to run again > (every other mmu notifier like mmu_notifier_invalidate_page will also > block). > > Then after we serialized against __split_huge_pmd through the MMU > notifier KVM internal locking, we are able to tell if any mmu_notifier > invalidate happened in the region just before get_user_pages_fast() > was invoked, until we call PageCompoundTransMap and we actually map > the shadow pagetable into the compound page with hugepage > granularity (to allow real 2MB TLBs if guest also uses trans_huge_pmd > in the guest pagetables). > > After the shadow pagetable is mapped, we drop the internal MMU > notifier lock and __split_huge_pmd mmu_notifier_invalidate_range_start > can continue and drop the shadow pagetable that we just mapped in the > above paragraph just before dropping the mmu notifier internal lock. > > To be able to tell if any invalidate happened while > get_user_pages_fast was running and until we grab the lock again and > we start mapping the shadow pagtable we use: > > mmu_seq = vcpu->kvm->mmu_notifier_seq; > smp_rmb(); > > if (try_async_pf(vcpu, prefault, gfn, v, &pfn, write, &map_writable)) > ^^^^^^^^^^^^ this is get_user_pages and does put_page on the page > and just returns the &pfn > this is why we need a get_user_pages_fast that won't > attempt to touch the page->_count at all! we can avoid > 2 atomic ops for each secondary MMU fault that way > return 0; > > spin_lock(&vcpu->kvm->mmu_lock); > if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) > goto out_unlock; > ... here we check PageTransCompoundMap(pfn_to_page(pfn)) and > map a 4k or 2MB shadow pagetable on "pfn" ... > > > Note mmu_notifier_retry does the other side of the smp_rmb(): > > smp_rmb(); > if (kvm->mmu_notifier_seq != mmu_seq) > return 1; > return 0; Okay, I see. But do we really want to make PageTransCompoundMap() visiable beyond KVM code? It looks like too KVM-specific. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>