Re: [PATCH v2 16/18] mm/memremap_pages: Support initializing pages to a zero reference count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dan Williams <dan.j.williams@xxxxxxxxx> writes:

> Jason Gunthorpe wrote:
>> On Thu, Sep 15, 2022 at 08:36:43PM -0700, Dan Williams wrote:
>> > The initial memremap_pages() implementation inherited the
>> > __init_single_page() default of pages starting life with an elevated
>> > reference count. This originally allowed for the page->pgmap pointer to
>> > alias with the storage for page->lru since a page was only allowed to be
>> > on an lru list when its reference count was zero.
>> >
>> > Since then, 'struct page' definition cleanups have arranged for
>> > dedicated space for the ZONE_DEVICE page metadata, and the
>> > MEMORY_DEVICE_{PRIVATE,COHERENT} work has arranged for the 1 -> 0
>> > page->_refcount transition to route the page to free_zone_device_page()
>> > and not the core-mm page-free. With those cleanups in place and with
>> > filesystem-dax and device-dax now converted to take and drop references
>> > at map and truncate time, it is possible to start MEMORY_DEVICE_FS_DAX
>> > and MEMORY_DEVICE_GENERIC reference counts at 0.
>> >
>> > MEMORY_DEVICE_{PRIVATE,COHERENT} still expect that their ZONE_DEVICE
>> > pages start life at _refcount 1, so make that the default if
>> > pgmap->init_mode is left at zero.
>>
>> I'm shocked to read this - how does it make any sense?
>
> I think what happened is that since memremap_pages() historically
> produced pages with an elevated reference count that GPU drivers skipped
> taking a reference on first allocation and just passed along an elevated
> reference count page to the first user.
>
> So either we keep that assumption or update all users to be prepared for
> idle pages coming out of memremap_pages().
>
> This is all in reaction to the "set_page_count(page, 1);" in
> free_zone_device_page(). Which I am happy to get rid of but need from
> help from MEMORY_DEVICE_{PRIVATE,COHERENT} folks to react to
> memremap_pages() starting all pages at reference count 0.

This is all rather good timing - This week I've been in the middle of
getting a series together which fixes this among other things. So I'm
all for fixing it and can help with that - my motivation was I needed to
be able to tell if a page is free or not with get_page_unless_zero()
etc. which doesn't work at the moment because free device
private/coherent pages have an elevated refcount.

 - Alistair

>> dev_pagemap_ops->page_free() is only called on the 1->0 transition, so
>> any driver which implements it must be expecting pages to have a 0
>> refcount.
>>
>> Looking around everything but only fsdax_pagemap_ops implements
>> page_free()
>
> Right.
>
>> So, how does it work? Surely the instant the page map is created all
>> the pages must be considered 'free', and after page_free() is called I
>> would also expect the page to be considered free.
>
> The GPU drivers need to increment reference counts when they hand out
> the page rather than reuse the reference count that they get by default.
>
>> How on earth can a free'd page have both a 0 and 1 refcount??
>
> This is residual wonkiness from memremap_pages() handing out pages with
> elevated reference counts at the outset.
>
>> eg look at the simple hmm_test, it threads pages on to the
>> mdevice->free_pages list immediately after memremap_pages and then
>> again inside page_free() - it is completely wrong that they would have
>> different refcounts while on the free_pages list.
>
> I do not see any page_ref_inc() in that test, only put_page() so it is
> assuming non-idle pages at the outset.
>
>> I would expect that after the page is removed from the free_pages list
>> it will have its recount set to 1 to make it non-free then it will go
>> through the migration.
>>
>> Alistair how should the refcounting be working here in hmm_test?
>>
>> Jason



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux