Re: [PATCH v2 16/18] mm/memremap_pages: Support initializing pages to a zero reference count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jason Gunthorpe wrote:
> On Wed, Sep 21, 2022 at 04:45:22PM -0700, Dan Williams wrote:
> > Jason Gunthorpe wrote:
> > > On Thu, Sep 15, 2022 at 08:36:43PM -0700, Dan Williams wrote:
> > > > The initial memremap_pages() implementation inherited the
> > > > __init_single_page() default of pages starting life with an elevated
> > > > reference count. This originally allowed for the page->pgmap pointer to
> > > > alias with the storage for page->lru since a page was only allowed to be
> > > > on an lru list when its reference count was zero.
> > > > 
> > > > Since then, 'struct page' definition cleanups have arranged for
> > > > dedicated space for the ZONE_DEVICE page metadata, and the
> > > > MEMORY_DEVICE_{PRIVATE,COHERENT} work has arranged for the 1 -> 0
> > > > page->_refcount transition to route the page to free_zone_device_page()
> > > > and not the core-mm page-free. With those cleanups in place and with
> > > > filesystem-dax and device-dax now converted to take and drop references
> > > > at map and truncate time, it is possible to start MEMORY_DEVICE_FS_DAX
> > > > and MEMORY_DEVICE_GENERIC reference counts at 0.
> > > > 
> > > > MEMORY_DEVICE_{PRIVATE,COHERENT} still expect that their ZONE_DEVICE
> > > > pages start life at _refcount 1, so make that the default if
> > > > pgmap->init_mode is left at zero.
> > > 
> > > I'm shocked to read this - how does it make any sense?
> > 
> > I think what happened is that since memremap_pages() historically
> > produced pages with an elevated reference count that GPU drivers skipped
> > taking a reference on first allocation and just passed along an elevated
> > reference count page to the first user.
> > 
> > So either we keep that assumption or update all users to be prepared for
> > idle pages coming out of memremap_pages().
> > 
> > This is all in reaction to the "set_page_count(page, 1);" in
> > free_zone_device_page(). Which I am happy to get rid of but need from
> > help from MEMORY_DEVICE_{PRIVATE,COHERENT} folks to react to
> > memremap_pages() starting all pages at reference count 0.
> 
> But, but this is all racy, it can't do this:
> 
> +	if (pgmap->ops && pgmap->ops->page_free)
> +		pgmap->ops->page_free(page);
>  
>  	/*
> +	 * Reset the page count to the @init_mode value to prepare for
> +	 * handing out the page again.
>  	 */
> +	if (pgmap->init_mode == INIT_PAGEMAP_BUSY)
> +		set_page_count(page, 1);
> 
> after the fact! Something like that hmm_test has already threaded the
> "freed" page into the free list via ops->page_free(), it can't have a
> 0 ref count and be on the free list, even temporarily :(
> 
> Maybe it nees to be re-ordered?
> 
> > > How on earth can a free'd page have both a 0 and 1 refcount??
> > 
> > This is residual wonkiness from memremap_pages() handing out pages with
> > elevated reference counts at the outset.
> 
> I think the answer to my question is the above troubled code where we
> still set the page refcount back to 1 even in the page_free path, so
> there is some consistency "a freed paged may have a refcount of 1" for
> the driver.
> 
> So, I guess this patch makes sense but I would put more noise around
> INIT_PAGEMAP_BUSY (eg annotate every driver that is using it with the
> explicit constant) and alert people that they need to fix their stuff
> to get rid of it.

Sounds reasonable.

> We should definately try to fix hmm_test as well so people have a good
> reference code to follow in fixing the other drivers :(

Oh, that's a good idea. I can probably fix that up and leave it to the
GPU driver folks to catch up with that example so we can kill off
INIT_PAGEMAP_BUSY.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux