On Wed, Nov 04, 2020 at 07:27:21PM +0800, Alex Shi wrote: > 在 2020/11/3 上午4:20, Johannes Weiner 写道: > > On Mon, Nov 02, 2020 at 02:49:27PM +0000, Matthew Wilcox wrote: > >> On Mon, Nov 02, 2020 at 09:41:10AM -0500, Johannes Weiner wrote: > >>> On Thu, Oct 29, 2020 at 06:44:53PM +0800, Alex Shi wrote: > >>>> From: Hugh Dickins <hughd@xxxxxxxxxx> > >>>> > >>>> It is necessary for page_idle_get_page() to recheck PageLRU() after > >>>> get_page_unless_zero(), but holding lru_lock around that serves no > >>>> useful purpose, and adds to lru_lock contention: delete it. > >>>> > >>>> See https://lore.kernel.org/lkml/20150504031722.GA2768@blaptop for the > >>>> discussion that led to lru_lock there; but __page_set_anon_rmap() now > >>>> uses WRITE_ONCE(), > >>> > >>> That doesn't seem to be the case in Linus's or Andrew's tree. Am I > >>> missing a dependent patch series? > >>> > >>>> and I see no other risk in page_idle_clear_pte_refs() using > >>>> rmap_walk() (beyond the risk of racing PageAnon->PageKsm, mostly but > >>>> not entirely prevented by page_count() check in ksm.c's > >>>> write_protect_page(): that risk being shared with page_referenced() > >>>> and not helped by lru_lock). > >>> > >>> Isn't it possible, as per Minchan's description, for page->mapping to > >>> point to a struct anon_vma without PAGE_MAPPING_ANON set, and rmap > >>> thinking it's looking at a struct address_space? > >> > >> I don't think it can point to an anon_vma without the ANON bit set. > >> Minchan's concern in that email was that it might still be NULL. > > > > Hm, no, the thread is a lengthy discussion about whether the store > > could be split such that page->mapping is actually pointing to > > something invalid (anon_vma without the PageAnon bit). > > > > From his email: > > > > CPU 0 CPU 1 > > > > do_anonymous_page > > __page_set_anon_rmap > > /* out of order happened so SetPageLRU is done ahead */ > > SetPageLRU(page) > > This SetPageLRU done in __pagevec_lru_add_fn() which under the lru_lock > protection, so the original memory barrier or lock concern isn't > correct. that means, the SetPageLRU isn't possible to be here. > And then no warry on right side 'CPU 1' problem. The SetPageLRU is done under lru_lock, but the store to page->mapping is not, so what ensures ordering between them? And what prevents the compiler from tearing the store to page->mapping? The writer does this: CPU 0 page_add_new_anon_rmap() page->mapping = anon_vma + PAGE_MAPPING_ANON lru_cache_add_inactive_or_unevictable() spin_lock(lruvec->lock) SetPageLRU() spin_unlock(lruvec->lock) The concern is what CPU 1 will observe at page->mapping after observing PageLRU set on the page. 1. anon_vma + PAGE_MAPPING_ANON That's the in-order scenario and is fine. 2. NULL That's possible if the page->mapping store gets reordered to occur after SetPageLRU. That's fine too because we check for it. 3. anon_vma without the PAGE_MAPPING_ANON bit That would be a problem and could lead to all kinds of undesirable behavior including crashes and data corruption. Is it possible? AFAICT the compiler is allowed to tear the store to page->mapping and I don't see anything that would prevent it. That said, I also don't see how the reader testing PageLRU under the lru_lock would prevent that in the first place. AFAICT we need that WRITE_ONCE() around the page->mapping assignment that's referenced in the changelog of this patch but I cannot find in any tree or patch.