On Thu, Sep 30, 2021 at 01:01:14PM +1000, Alistair Popple wrote: > On Thursday, 30 September 2021 5:34:05 AM AEST Jason Gunthorpe wrote: > > On Wed, Sep 29, 2021 at 12:50:15PM +0100, Joao Martins wrote: > > > > > > If the get_dev_pagemap has to remain then it just means we have to > > > > flush before changing pagemap pointers > > > Right -- I don't think we should need it as that discussion on the other > > > thread goes. > > > > > > OTOH, using @pgmap might be useful to unblock gup-fast FOLL_LONGTERM > > > for certain devmap types[0] (like MEMORY_DEVICE_GENERIC [device-dax] > > > can support it but not MEMORY_DEVICE_FSDAX [fsdax]). > > > > When looking at Logan's patches I think it is pretty clear to me that > > page->pgmap must never be a dangling pointer if the caller has a > > legitimate refcount on the page. > > > > For instance the migrate and stuff all blindly calls > > is_device_private_page() on the struct page expecting a valid > > page->pgmap. > > > > This also looks like it is happening, ie > > > > void __put_page(struct page *page) > > { > > if (is_zone_device_page(page)) { > > put_dev_pagemap(page->pgmap); > > > > Is indeed putting the pgmap ref back when the page becomes ungettable. > > > > This properly happens when the page refcount goes to zero and so it > > should fully interlock with __page_cache_add_speculative(): > > > > if (unlikely(!page_ref_add_unless(page, count, 0))) { > > > > Thus, in gup.c, if we succeed at try_grab_compound_head() then > > page->pgmap is a stable pointer with a valid refcount. > > > > So, all the external pgmap stuff in gup.c is completely pointless. > > try_grab_compound_head() provides us with an equivalent protection at > > lower cost. Remember gup.c doesn't deref the pgmap at all. > > > > Dan/Alistair/Felix do you see any hole in that argument?? > > As background note that device pages are currently considered free when > refcount == 1 but the pgmap reference is dropped when the refcount transitions > 1->0. The final pgmap reference is typically dropped when a driver calls > memunmap_pages() and put_page() drops the last page reference: > > void memunmap_pages(struct dev_pagemap *pgmap) > { > unsigned long pfn; > int i; > > dev_pagemap_kill(pgmap); > for (i = 0; i < pgmap->nr_range; i++) > for_each_device_pfn(pfn, pgmap, i) > put_page(pfn_to_page(pfn)); > dev_pagemap_cleanup(pgmap); > > If there are still pgmap references dev_pagemap_cleanup(pgmap) will block until > the final reference is dropped. So I think your argument holds at least for > DEVICE_PRIVATE and DEVICE_GENERIC. DEVICE_FS_DAX defines it's own pagemap > cleanup but I can't see why the same argument wouldn't hold there - if a page > has a valid refcount it must have a reference on the pagemap too. To close this circle - the issue is use after free on the struct page * entry while it has a zero ref. memunmap_pages() does wait for the refcount to go to zero, but it then goes on to free the memory under the struct pages. However there are possibly still untracked references to this memory in the page tables. This is the bug Dan has been working on - to shootdown page table mappings before getting to memunmap_pages() Getting the page map ref will make the use-after-free never crash, just be a silent security problem. :( Jason