On Wed, Sep 21, 2022 at 04:45:22PM -0700, Dan Williams wrote: > Jason Gunthorpe wrote: > > On Thu, Sep 15, 2022 at 08:36:43PM -0700, Dan Williams wrote: > > > The initial memremap_pages() implementation inherited the > > > __init_single_page() default of pages starting life with an elevated > > > reference count. This originally allowed for the page->pgmap pointer to > > > alias with the storage for page->lru since a page was only allowed to be > > > on an lru list when its reference count was zero. > > > > > > Since then, 'struct page' definition cleanups have arranged for > > > dedicated space for the ZONE_DEVICE page metadata, and the > > > MEMORY_DEVICE_{PRIVATE,COHERENT} work has arranged for the 1 -> 0 > > > page->_refcount transition to route the page to free_zone_device_page() > > > and not the core-mm page-free. With those cleanups in place and with > > > filesystem-dax and device-dax now converted to take and drop references > > > at map and truncate time, it is possible to start MEMORY_DEVICE_FS_DAX > > > and MEMORY_DEVICE_GENERIC reference counts at 0. > > > > > > MEMORY_DEVICE_{PRIVATE,COHERENT} still expect that their ZONE_DEVICE > > > pages start life at _refcount 1, so make that the default if > > > pgmap->init_mode is left at zero. > > > > I'm shocked to read this - how does it make any sense? > > I think what happened is that since memremap_pages() historically > produced pages with an elevated reference count that GPU drivers skipped > taking a reference on first allocation and just passed along an elevated > reference count page to the first user. > > So either we keep that assumption or update all users to be prepared for > idle pages coming out of memremap_pages(). > > This is all in reaction to the "set_page_count(page, 1);" in > free_zone_device_page(). Which I am happy to get rid of but need from > help from MEMORY_DEVICE_{PRIVATE,COHERENT} folks to react to > memremap_pages() starting all pages at reference count 0. But, but this is all racy, it can't do this: + if (pgmap->ops && pgmap->ops->page_free) + pgmap->ops->page_free(page); /* + * Reset the page count to the @init_mode value to prepare for + * handing out the page again. */ + if (pgmap->init_mode == INIT_PAGEMAP_BUSY) + set_page_count(page, 1); after the fact! Something like that hmm_test has already threaded the "freed" page into the free list via ops->page_free(), it can't have a 0 ref count and be on the free list, even temporarily :( Maybe it nees to be re-ordered? > > How on earth can a free'd page have both a 0 and 1 refcount?? > > This is residual wonkiness from memremap_pages() handing out pages with > elevated reference counts at the outset. I think the answer to my question is the above troubled code where we still set the page refcount back to 1 even in the page_free path, so there is some consistency "a freed paged may have a refcount of 1" for the driver. So, I guess this patch makes sense but I would put more noise around INIT_PAGEMAP_BUSY (eg annotate every driver that is using it with the explicit constant) and alert people that they need to fix their stuff to get rid of it. We should definately try to fix hmm_test as well so people have a good reference code to follow in fixing the other drivers :( Jason