On Thu, Sep 30, 2021 at 06:54:05PM +0100, Joao Martins wrote: > On 9/30/21 04:01, Alistair Popple wrote: > > On Thursday, 30 September 2021 5:34:05 AM AEST Jason Gunthorpe wrote: > >> On Wed, Sep 29, 2021 at 12:50:15PM +0100, Joao Martins wrote: > >> > >>>> If the get_dev_pagemap has to remain then it just means we have to > >>>> flush before changing pagemap pointers > >>> Right -- I don't think we should need it as that discussion on the other > >>> thread goes. > >>> > >>> OTOH, using @pgmap might be useful to unblock gup-fast FOLL_LONGTERM > >>> for certain devmap types[0] (like MEMORY_DEVICE_GENERIC [device-dax] > >>> can support it but not MEMORY_DEVICE_FSDAX [fsdax]). > >> > >> When looking at Logan's patches I think it is pretty clear to me that > >> page->pgmap must never be a dangling pointer if the caller has a > >> legitimate refcount on the page. > >> > >> For instance the migrate and stuff all blindly calls > >> is_device_private_page() on the struct page expecting a valid > >> page->pgmap. > >> > >> This also looks like it is happening, ie > >> > >> void __put_page(struct page *page) > >> { > >> if (is_zone_device_page(page)) { > >> put_dev_pagemap(page->pgmap); > >> > >> Is indeed putting the pgmap ref back when the page becomes ungettable. > >> > >> This properly happens when the page refcount goes to zero and so it > >> should fully interlock with __page_cache_add_speculative(): > >> > >> if (unlikely(!page_ref_add_unless(page, count, 0))) { > >> > >> Thus, in gup.c, if we succeed at try_grab_compound_head() then > >> page->pgmap is a stable pointer with a valid refcount. > >> > >> So, all the external pgmap stuff in gup.c is completely pointless. > >> try_grab_compound_head() provides us with an equivalent protection at > >> lower cost. Remember gup.c doesn't deref the pgmap at all. > >> > >> Dan/Alistair/Felix do you see any hole in that argument?? > > > > As background note that device pages are currently considered free when > > refcount == 1 but the pgmap reference is dropped when the refcount transitions > > 1->0. The final pgmap reference is typically dropped when a driver calls > > memunmap_pages() and put_page() drops the last page reference: > > > > void memunmap_pages(struct dev_pagemap *pgmap) > > { > > unsigned long pfn; > > int i; > > > > dev_pagemap_kill(pgmap); > > for (i = 0; i < pgmap->nr_range; i++) > > for_each_device_pfn(pfn, pgmap, i) > > put_page(pfn_to_page(pfn)); > > dev_pagemap_cleanup(pgmap); > > > > If there are still pgmap references dev_pagemap_cleanup(pgmap) will block until > > the final reference is dropped. So I think your argument holds at least for > > DEVICE_PRIVATE and DEVICE_GENERIC. DEVICE_FS_DAX defines it's own pagemap > > cleanup but I can't see why the same argument wouldn't hold there - if a page > > has a valid refcount it must have a reference on the pagemap too. > > IIUC Dan's reasoning was that fsdax wasn't able to deal with > surprise removal [1] so his patches were to ensure fsdax (or the > pmem block device) poisons/kills the pages as a way to notify > filesystem/dm that the page was to be kept unmapped: Sure, but that has nothing to do with GUP, that is between the filesytem and fsdax > But if fsdax doesn't wait for all the pgmap references[*] on its > pagemap cleanup callback then what's the pgmap ref in > __gup_device_huge() pairs/protects us up against that is specific to > fsdax? It does wait for refs It sets the pgmap.ref to: pmem->pgmap.ref = &q->q_usage_counter; And that ref is incr'd by the struct page lifetime - the unincr is in __put_page() above fsdax_pagemap_ops does pmem_pagemap_kill() which calls blk_freeze_queue_start() which does percpu_ref_kill(). Then the pmem_pagemap_cleanup() eventually does blk_mq_freeze_queue_wait() which will sleep until the prefcpu ref reaches zero. In other words fsdax cannot pass cleanup while a struct page exists with a non-zero refcount, which answers Alistair's question about how fsdax's cleanup work. Jason