On 9/30/21 04:01, Alistair Popple wrote: > On Thursday, 30 September 2021 5:34:05 AM AEST Jason Gunthorpe wrote: >> On Wed, Sep 29, 2021 at 12:50:15PM +0100, Joao Martins wrote: >> >>>> If the get_dev_pagemap has to remain then it just means we have to >>>> flush before changing pagemap pointers >>> Right -- I don't think we should need it as that discussion on the other >>> thread goes. >>> >>> OTOH, using @pgmap might be useful to unblock gup-fast FOLL_LONGTERM >>> for certain devmap types[0] (like MEMORY_DEVICE_GENERIC [device-dax] >>> can support it but not MEMORY_DEVICE_FSDAX [fsdax]). >> >> When looking at Logan's patches I think it is pretty clear to me that >> page->pgmap must never be a dangling pointer if the caller has a >> legitimate refcount on the page. >> >> For instance the migrate and stuff all blindly calls >> is_device_private_page() on the struct page expecting a valid >> page->pgmap. >> >> This also looks like it is happening, ie >> >> void __put_page(struct page *page) >> { >> if (is_zone_device_page(page)) { >> put_dev_pagemap(page->pgmap); >> >> Is indeed putting the pgmap ref back when the page becomes ungettable. >> >> This properly happens when the page refcount goes to zero and so it >> should fully interlock with __page_cache_add_speculative(): >> >> if (unlikely(!page_ref_add_unless(page, count, 0))) { >> >> Thus, in gup.c, if we succeed at try_grab_compound_head() then >> page->pgmap is a stable pointer with a valid refcount. >> >> So, all the external pgmap stuff in gup.c is completely pointless. >> try_grab_compound_head() provides us with an equivalent protection at >> lower cost. Remember gup.c doesn't deref the pgmap at all. >> >> Dan/Alistair/Felix do you see any hole in that argument?? > > As background note that device pages are currently considered free when > refcount == 1 but the pgmap reference is dropped when the refcount transitions > 1->0. The final pgmap reference is typically dropped when a driver calls > memunmap_pages() and put_page() drops the last page reference: > > void memunmap_pages(struct dev_pagemap *pgmap) > { > unsigned long pfn; > int i; > > dev_pagemap_kill(pgmap); > for (i = 0; i < pgmap->nr_range; i++) > for_each_device_pfn(pfn, pgmap, i) > put_page(pfn_to_page(pfn)); > dev_pagemap_cleanup(pgmap); > > If there are still pgmap references dev_pagemap_cleanup(pgmap) will block until > the final reference is dropped. So I think your argument holds at least for > DEVICE_PRIVATE and DEVICE_GENERIC. DEVICE_FS_DAX defines it's own pagemap > cleanup but I can't see why the same argument wouldn't hold there - if a page > has a valid refcount it must have a reference on the pagemap too. IIUC Dan's reasoning was that fsdax wasn't able to deal with surprise removal [1] so his patches were to ensure fsdax (or the pmem block device) poisons/kills the pages as a way to notify filesystem/dm that the page was to be kept unmapped: [1] https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ But if fsdax doesn't wait for all the pgmap references[*] on its pagemap cleanup callback then what's the pgmap ref in __gup_device_huge() pairs/protects us up against that is specific to fsdax? I am not sure I follow how both the fsdax specific issue ties in with this pgmap ref being there above. Joao [*] or at least fsdax_pagemap_ops doesn't suggest the contrary ... compared to dev_pagemap_{kill,cleanup}