On Thu, Jan 08, 2015 at 08:00:45PM +0100, Andrea Arcangeli wrote: > On Thu, Jan 08, 2015 at 11:59:06AM +0000, Marc Zyngier wrote: > > From: Steve Capper <steve.capper@xxxxxxxxxx> > > > > ptep_clear_flush_young_notify and pmdp_clear_flush_young_notify both > > call the notifiers *after* the pte/pmd has been made young. > > > > On x86 on EPT without hardware access bit (!shadow_accessed_mask), > we'll trigger a KVM page fault (gup_fast) which would mark the page > referenced to give it higher priority in the LRU (or set the accessed > bit if it's a THP). > > If we drop the KVM shadow pagetable before clearing the accessed bit > in the linux pte, there's a window where we won't set the young bit > for THP. For non-THP it's less of an issue because gup_fast calls > mark_page_accessed which rolls the lrus and sets the referenced bit in > the struct page, so the effect of mark_page_accessed doesn't get > lost when the linux pte accessed bit is cleared. > > We could also consider using mark_page_accessed in > follow_trans_huge_pmd to workaround the problem. I think setting the > young bit in gup_fast is correct and would be more similar to a real > CPU access (which is what gup_fast simulates anyway) so below patch > literally is introducing a race condition even if it's going to be > lost in the noise and it's not a problem. > > > This can cause problems with KVM that relies on being able to block > > MMU notifiers when carrying out maintenance of second stage > > descriptors. > > > > This patch ensures that the MMU notifiers are called before ptes and > > pmds are made old. > > Unfortunately I don't understand why this is needed. > > The only difference this can make to KVM is that without the patch, > kvm_age_rmapp is called while the linux pte is less likely to have the > accessed bit set (current behavior). It can still be set by hardware > through another CPU touching the memory before the mmu notifier is > invoked. > > With the patch the linux pte is more likely to have the accessed bit > set as it's not cleared before calling the mmu notifier. > > In both cases (at least in x86 where the accessed bit is always set in > hardware) the accessed bit may or may not be set. The pte can not > otherwise change as it's called with the PT lock. > > So again it looks a noop and it introduces a mostly theoretical race > condition for THP young bit in the linux pte with EPT and > !shadow_accessed_mask. > > Clearly there must be some arm obscure detail I'm not aware of that > makes this helpful but the description in commit header isn't enough > to get what's up with blocking mmu notifiers or such. Could you > elaborate? Hi Andrea, A big thank you for going through this patch in detail. Having had another think about this; apologies, I think I made a mistake in formulating this patch to try and fix an issue we were experiencing on arm64 (which has since been fixed). I think this patch can be safely ignored, especially as the kvm_mmu_notifier_clear_flush_young notifier does very little on arm64 (because kvm_age_hva returns 0, due to a lack of shadow pagetables). Sorry for the noise. Cheers, -- Steve -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html