Re: [PATCH 1/4] mm: Correct ordering of *_clear_flush_young_notify

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 08, 2015 at 08:00:45PM +0100, Andrea Arcangeli wrote:
> On Thu, Jan 08, 2015 at 11:59:06AM +0000, Marc Zyngier wrote:
> > From: Steve Capper <steve.capper@xxxxxxxxxx>
> > 
> > ptep_clear_flush_young_notify and pmdp_clear_flush_young_notify both
> > call the notifiers *after* the pte/pmd has been made young.
> > 
> 
> On x86 on EPT without hardware access bit (!shadow_accessed_mask),
> we'll trigger a KVM page fault (gup_fast) which would mark the page
> referenced to give it higher priority in the LRU (or set the accessed
> bit if it's a THP).
> 
> If we drop the KVM shadow pagetable before clearing the accessed bit
> in the linux pte, there's a window where we won't set the young bit
> for THP. For non-THP it's less of an issue because gup_fast calls
> mark_page_accessed which rolls the lrus and sets the referenced bit in
> the struct page, so the effect of mark_page_accessed doesn't get
> lost when the linux pte accessed bit is cleared.
> 
> We could also consider using mark_page_accessed in
> follow_trans_huge_pmd to workaround the problem.  I think setting the
> young bit in gup_fast is correct and would be more similar to a real
> CPU access (which is what gup_fast simulates anyway) so below patch
> literally is introducing a race condition even if it's going to be
> lost in the noise and it's not a problem.
> 
> > This can cause problems with KVM that relies on being able to block
> > MMU notifiers when carrying out maintenance of second stage
> > descriptors.
> > 
> > This patch ensures that the MMU notifiers are called before ptes and
> > pmds are made old.
> 
> Unfortunately I don't understand why this is needed.
> 
> The only difference this can make to KVM is that without the patch,
> kvm_age_rmapp is called while the linux pte is less likely to have the
> accessed bit set (current behavior). It can still be set by hardware
> through another CPU touching the memory before the mmu notifier is
> invoked.
> 
> With the patch the linux pte is more likely to have the accessed bit
> set as it's not cleared before calling the mmu notifier.
> 
> In both cases (at least in x86 where the accessed bit is always set in
> hardware) the accessed bit may or may not be set. The pte can not
> otherwise change as it's called with the PT lock.
> 
> So again it looks a noop and it introduces a mostly theoretical race
> condition for THP young bit in the linux pte with EPT and
> !shadow_accessed_mask.
> 
> Clearly there must be some arm obscure detail I'm not aware of that
> makes this helpful but the description in commit header isn't enough
> to get what's up with blocking mmu notifiers or such. Could you
> elaborate?

Hi Andrea,
A big thank you for going through this patch in detail.

Having had another think about this; apologies, I think I made a
mistake in formulating this patch to try and fix an issue we were
experiencing on arm64 (which has since been fixed).

I think this patch can be safely ignored, especially as the
kvm_mmu_notifier_clear_flush_young notifier does very little on arm64
(because kvm_age_hva returns 0, due to a lack of shadow pagetables).

Sorry for the noise.

Cheers,
-- 
Steve
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux