On Sat, Dec 05, 2020 at 04:34:23PM +0100, Oscar Salvador wrote: > On Fri, Dec 04, 2020 at 06:25:31PM +0100, Vlastimil Babka wrote: > > OK, so that means we don't introduce this race for MADV_SOFT_OFFLINE, but it's > > already (and still) there for MADV_HWPOISON since Dan's 23e7b5c2e271 ("mm, > > madvise_inject_error: Let memory_failure() optionally take a page reference") no? > > What about the following? > CCing Dan as well. Hi Oscar, Vlastimil, Thanks for mentioning this. I agree with that direction. > > From: Oscar Salvador <osalvador@xxxxxxx> > Date: Sat, 5 Dec 2020 16:14:40 +0100 > Subject: [PATCH] mm,memory_failure: Always pin the page in > madvise_inject_error > > madvise_inject_error() uses get_user_pages_fast to get the page > from the addr we specified. > After [1], we drop such extra reference for memory_failure() path. > That commit says that memory_failure wanted to keep the pin in order > to take the page out of circulation. > > The truth is that we need to keep the page pinned, otherwise the > page might be re-used after the put_page(), and we can end up messing > with someone else's memory. > E.g: > > CPU0 > process X CPU1 > madvise_inject_error > get_user_pages > put_page > page gets reclaimed > process Y allocates the page > memory_failure > // We mess with process Y memory > > madvise() is meant to operate on a self address space, so messing with > pages that do not belong to us seems the wrong thing to do. > To avoid that, let us keep the page pinned for memory_failure as well. > > Pages for DAX mappings will release this extra refcount in > memory_failure_dev_pagemap. > > [1] ("23e7b5c2e271: mm, madvise_inject_error: > Let memory_failure() optionally take a page reference") > > Signed-off-by: Oscar Salvador <osalvador@xxxxxxx> > Suggested-by: Vlastimil Babka <vbabka@xxxxxxx> > Fixes: 23e7b5c2e271 ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference") > --- > mm/madvise.c | 9 +-------- > mm/memory-failure.c | 6 ++++++ > 2 files changed, 7 insertions(+), 8 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index c6b5524add58..19edddba196d 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -907,14 +907,7 @@ static int madvise_inject_error(int behavior, > } else { > pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n", > pfn, start); > - /* > - * Drop the page reference taken by get_user_pages_fast(). In > - * the absence of MF_COUNT_INCREASED the memory_failure() > - * routine is responsible for pinning the page to prevent it > - * from being released back to the page allocator. > - */ > - put_page(page); > - ret = memory_failure(pfn, 0); > + ret = memory_failure(pfn, MF_COUNT_INCREASED); > } > > if (ret) > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 869ece2a1de2..ba861169c9ae 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1269,6 +1269,12 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > if (!cookie) > goto out; > > + if (flags & MF_COUNT_INCREASED) > + /* > + * Drop the extra refcount in case we come from madvise(). > + */ > + put_page(page); > + Should this if-block come before dax_lock_page() block? It seems that if dax_lock_page returns NULL, memory_failure_dev_pagemap() returns without releasing the refcount. memory_failure() on dev_pagemap doesn't use page refcount (unlike other type of memory), so we can release it unconditionally. Thanks, Naoya Horiguchi