On 7/16/19 5:14 PM, Ralph Campbell wrote: > When a ZONE_DEVICE private page is freed, the page->mapping field can be > set. If this page is reused as an anonymous page, the previous value can > prevent the page from being inserted into the CPU's anon rmap table. > For example, when migrating a pte_none() page to device memory: > migrate_vma(ops, vma, start, end, src, dst, private) > migrate_vma_collect() > src[] = MIGRATE_PFN_MIGRATE > migrate_vma_prepare() > /* no page to lock or isolate so OK */ > migrate_vma_unmap() > /* no page to unmap so OK */ > ops->alloc_and_copy() > /* driver allocates ZONE_DEVICE page for dst[] */ > migrate_vma_pages() > migrate_vma_insert_page() > page_add_new_anon_rmap() > __page_set_anon_rmap() > /* This check sees the page's stale mapping field */ > if (PageAnon(page)) > return > /* page->mapping is not updated */ > > The result is that the migration appears to succeed but a subsequent CPU > fault will be unable to migrate the page back to system memory or worse. > > Clear the page->mapping field when freeing the ZONE_DEVICE page so stale > pointer data doesn't affect future page use. > > Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Ralph Campbell <rcampbell@xxxxxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > Cc: Logan Gunthorpe <logang@xxxxxxxxxxxx> > Cc: Ira Weiny <ira.weiny@xxxxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > Cc: Jan Kara <jack@xxxxxxx> > Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Cc: "Jérôme Glisse" <jglisse@xxxxxxxxxx> > --- > kernel/memremap.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/kernel/memremap.c b/kernel/memremap.c > index bea6f887adad..238ae5d0ae8a 100644 > --- a/kernel/memremap.c > +++ b/kernel/memremap.c > @@ -408,6 +408,10 @@ void __put_devmap_managed_page(struct page *page) > > mem_cgroup_uncharge(page); > > + /* Clear anonymous page mapping to prevent stale pointers */ This is sufficiently complex, that some concise form of the documentation that you've put in the commit description, needs to also exist right here, as a comment. How's this read: diff --git a/kernel/memremap.c b/kernel/memremap.c index 238ae5d0ae8a..e52e9da5d0a7 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -408,7 +408,27 @@ void __put_devmap_managed_page(struct page *page) mem_cgroup_uncharge(page); - /* Clear anonymous page mapping to prevent stale pointers */ + /* + * When a device_private page is freed, the page->mapping field + * may still contain a (stale) mapping value. For example, the + * lower bits of page->mapping may still identify the page an an + * anonymous page. Ultimately, this entire field is just stale + * and wrong, and it will cause errors if not cleared. One + * example is: + * + * migrate_vma_pages() + * migrate_vma_insert_page() + * page_add_new_anon_rmap() + * __page_set_anon_rmap() + * ...checks page->mapping, via PageAnon(page) call, + * and incorrectly concludes that the page is an + * anonymous page. Therefore, it incorrectly, + * silently fails to set up the new anon rmap. + * + * For other types of ZONE_DEVICE pages, migration is either + * handled differently or not done at all, so there is no need + * to clear page->mapping. + */ if (is_device_private_page(page)) page->mapping = NULL; ? thanks, -- John Hubbard NVIDIA