Jason Gunthorpe wrote: > On Wed, Sep 21, 2022 at 04:45:22PM -0700, Dan Williams wrote: > > Jason Gunthorpe wrote: > > > On Thu, Sep 15, 2022 at 08:36:43PM -0700, Dan Williams wrote: > > > > The initial memremap_pages() implementation inherited the > > > > __init_single_page() default of pages starting life with an elevated > > > > reference count. This originally allowed for the page->pgmap pointer to > > > > alias with the storage for page->lru since a page was only allowed to be > > > > on an lru list when its reference count was zero. > > > > > > > > Since then, 'struct page' definition cleanups have arranged for > > > > dedicated space for the ZONE_DEVICE page metadata, and the > > > > MEMORY_DEVICE_{PRIVATE,COHERENT} work has arranged for the 1 -> 0 > > > > page->_refcount transition to route the page to free_zone_device_page() > > > > and not the core-mm page-free. With those cleanups in place and with > > > > filesystem-dax and device-dax now converted to take and drop references > > > > at map and truncate time, it is possible to start MEMORY_DEVICE_FS_DAX > > > > and MEMORY_DEVICE_GENERIC reference counts at 0. > > > > > > > > MEMORY_DEVICE_{PRIVATE,COHERENT} still expect that their ZONE_DEVICE > > > > pages start life at _refcount 1, so make that the default if > > > > pgmap->init_mode is left at zero. > > > > > > I'm shocked to read this - how does it make any sense? > > > > I think what happened is that since memremap_pages() historically > > produced pages with an elevated reference count that GPU drivers skipped > > taking a reference on first allocation and just passed along an elevated > > reference count page to the first user. > > > > So either we keep that assumption or update all users to be prepared for > > idle pages coming out of memremap_pages(). > > > > This is all in reaction to the "set_page_count(page, 1);" in > > free_zone_device_page(). Which I am happy to get rid of but need from > > help from MEMORY_DEVICE_{PRIVATE,COHERENT} folks to react to > > memremap_pages() starting all pages at reference count 0. > > But, but this is all racy, it can't do this: > > + if (pgmap->ops && pgmap->ops->page_free) > + pgmap->ops->page_free(page); > > /* > + * Reset the page count to the @init_mode value to prepare for > + * handing out the page again. > */ > + if (pgmap->init_mode == INIT_PAGEMAP_BUSY) > + set_page_count(page, 1); > > after the fact! Something like that hmm_test has already threaded the > "freed" page into the free list via ops->page_free(), it can't have a > 0 ref count and be on the free list, even temporarily :( > > Maybe it nees to be re-ordered? > > > > How on earth can a free'd page have both a 0 and 1 refcount?? > > > > This is residual wonkiness from memremap_pages() handing out pages with > > elevated reference counts at the outset. > > I think the answer to my question is the above troubled code where we > still set the page refcount back to 1 even in the page_free path, so > there is some consistency "a freed paged may have a refcount of 1" for > the driver. > > So, I guess this patch makes sense but I would put more noise around > INIT_PAGEMAP_BUSY (eg annotate every driver that is using it with the > explicit constant) and alert people that they need to fix their stuff > to get rid of it. Sounds reasonable. > We should definately try to fix hmm_test as well so people have a good > reference code to follow in fixing the other drivers :( Oh, that's a good idea. I can probably fix that up and leave it to the GPU driver folks to catch up with that example so we can kill off INIT_PAGEMAP_BUSY.