On 7/19/19 12:29 PM, Ralph Campbell wrote: > When a ZONE_DEVICE private page is freed, the page->mapping field can be > set. If this page is reused as an anonymous page, the previous value can > prevent the page from being inserted into the CPU's anon rmap table. > For example, when migrating a pte_none() page to device memory: > migrate_vma(ops, vma, start, end, src, dst, private) > migrate_vma_collect() > src[] = MIGRATE_PFN_MIGRATE > migrate_vma_prepare() > /* no page to lock or isolate so OK */ > migrate_vma_unmap() > /* no page to unmap so OK */ > ops->alloc_and_copy() > /* driver allocates ZONE_DEVICE page for dst[] */ > migrate_vma_pages() > migrate_vma_insert_page() > page_add_new_anon_rmap() > __page_set_anon_rmap() > /* This check sees the page's stale mapping field */ > if (PageAnon(page)) > return > /* page->mapping is not updated */ > > The result is that the migration appears to succeed but a subsequent CPU > fault will be unable to migrate the page back to system memory or worse. > > Clear the page->mapping field when freeing the ZONE_DEVICE page so stale > pointer data doesn't affect future page use. Reviewed-by: John Hubbard <jhubbard@xxxxxxxxxx> thanks, -- John Hubbard NVIDIA > > Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Ralph Campbell <rcampbell@xxxxxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > Cc: Logan Gunthorpe <logang@xxxxxxxxxxxx> > Cc: Ira Weiny <ira.weiny@xxxxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > Cc: Jan Kara <jack@xxxxxxx> > Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Cc: "Jérôme Glisse" <jglisse@xxxxxxxxxx> > --- > kernel/memremap.c | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > > diff --git a/kernel/memremap.c b/kernel/memremap.c > index bea6f887adad..98d04466dcde 100644 > --- a/kernel/memremap.c > +++ b/kernel/memremap.c > @@ -408,6 +408,30 @@ void __put_devmap_managed_page(struct page *page) > > mem_cgroup_uncharge(page); > > + /* > + * When a device_private page is freed, the page->mapping field > + * may still contain a (stale) mapping value. For example, the > + * lower bits of page->mapping may still identify the page as > + * an anonymous page. Ultimately, this entire field is just > + * stale and wrong, and it will cause errors if not cleared. > + * One example is: > + * > + * migrate_vma_pages() > + * migrate_vma_insert_page() > + * page_add_new_anon_rmap() > + * __page_set_anon_rmap() > + * ...checks page->mapping, via PageAnon(page) call, > + * and incorrectly concludes that the page is an > + * anonymous page. Therefore, it incorrectly, > + * silently fails to set up the new anon rmap. > + * > + * For other types of ZONE_DEVICE pages, migration is either > + * handled differently or not done at all, so there is no need > + * to clear page->mapping. > + */ > + if (is_device_private_page(page)) > + page->mapping = NULL; > + > page->pgmap->ops->page_free(page); > } else if (!count) > __put_page(page); >