On 10/25/22 15:47, Hyeonggon Yoo wrote: > On Mon, Oct 24, 2022 at 04:35:04PM +0200, Vlastimil Babka wrote: > > [,,,] > >> diff --git a/mm/slab.c b/mm/slab.c >> index 59c8e28f7b6a..219beb48588e 100644 >> --- a/mm/slab.c >> +++ b/mm/slab.c >> @@ -1370,6 +1370,8 @@ static struct slab *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, >> >> account_slab(slab, cachep->gfporder, cachep, flags); >> __folio_set_slab(folio); >> + /* Make the flag visible before any changes to folio->mapping */ >> + smp_wmb(); >> /* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */ >> if (sk_memalloc_socks() && page_is_pfmemalloc(folio_page(folio, 0))) >> slab_set_pfmemalloc(slab); >> @@ -1387,9 +1389,11 @@ static void kmem_freepages(struct kmem_cache *cachep, struct slab *slab) >> >> BUG_ON(!folio_test_slab(folio)); >> __slab_clear_pfmemalloc(slab); >> - __folio_clear_slab(folio); >> page_mapcount_reset(folio_page(folio, 0)); >> folio->mapping = NULL; >> + /* Make the mapping reset visible before clearing the flag */ >> + smp_wmb(); >> + __folio_clear_slab(folio); >> >> if (current->reclaim_state) >> current->reclaim_state->reclaimed_slab += 1 << order; >> diff --git a/mm/slub.c b/mm/slub.c >> index 157527d7101b..6dc17cb915c5 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -1800,6 +1800,8 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node, >> >> slab = folio_slab(folio); >> __folio_set_slab(folio); >> + /* Make the flag visible before any changes to folio->mapping */ >> + smp_wmb(); >> if (page_is_pfmemalloc(folio_page(folio, 0))) >> slab_set_pfmemalloc(slab); >> >> @@ -2008,8 +2010,10 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab) >> } >> >> __slab_clear_pfmemalloc(slab); >> - __folio_clear_slab(folio); >> folio->mapping = NULL; >> + /* Make the mapping reset visible before clearing the flag */ >> + smp_wmb(); >> + __folio_clear_slab(folio); >> if (current->reclaim_state) >> current->reclaim_state->reclaimed_slab += pages; >> unaccount_slab(slab, order, s); >> -- >> 2.38.0 > > Do we need to try this with memory barriers before frozen refcount lands in? There was IIRC an unresolved issue with frozen refcount tripping the page isolation code so I didn't want to be depending on that. > It's quite complicated and IIUC there is a still theoretical race: > > At isolate_movable_page: At slab alloc: At slab free: > folio = alloc_pages(flags, order) > > folio_try_get() > folio_test_slab() == false > __folio_set_slab(folio) > smp_wmb() > > call_rcu(&slab->rcu_head, rcu_free_slab); > > > smp_rmb() > __folio_test_movable() == true > > folio->mapping = NULL; > smp_wmb() > __folio_clear_slab(folio); > smp_rmb() > folio_test_slab() == false > > folio_trylock() There's also between above and below: if (!PageMovable(page) || PageIsolated(page)) goto out_no_isolated; mops = page_movable_ops(page); If we put another smp_rmb() before the PageMovable test, could that have helped? It would assure we observe the folio->mapping = NULL; from the "slab free" side? But yeah, it's getting ridiculous. Maybe there's a simpler way to check two bits in two different bytes atomically. Or maybe it's just an impossible task, I feel I just dunno computers at this point. > mops->isolate_page() (*crash*) > > > Please let me know if I'm missing something ;-) > Thanks! >