Re: [PATCH] mm/memremap: Introduce pgmap_request_folio() using pgmap offsets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2022-10-20 19:17, Dan Williams wrote:
Felix Kuehling wrote:
Am 2022-10-20 um 17:56 schrieb Dan Williams:
A 'struct dev_pagemap' (pgmap) represents a collection of ZONE_DEVICE
pages. The pgmap is a reference counted object that serves a similar
role as a 'struct request_queue'. Live references are obtained for each
in flight request / page, and once a page's reference count drops to
zero the associated pin of the pgmap is dropped as well. While a page is
idle nothing should be accessing it because that is effectively a
use-after-free situation. Unfortunately, all current ZONE_DEVICE
implementations deploy a layering violation to manage requests to
activate pages owned by a pgmap. Specifically, they take steps like walk
the pfns that were previously assigned at memremap_pages() time and use
pfn_to_page() to recall metadata like page->pgmap, or make use of other
data like page->zone_device_data.

The first step towards correcting that situation is to provide a
API to get access to a pgmap page that does not require the caller to
know the pfn, nor access any fields of an idle page. Ideally this API
would be able to support dynamic page creation instead of the current
status quo of pre-allocating and initializing pages.
If I understand it correctly, the current code works because the struct
pages are pre-allocated. Is the end-goal here to make the struct page
allocation for ZONE_DEVICE pages dynamic.
Some DEVICE_PRIVATE users have already open-coded their own coarse
grained dynamic ZONE_DEVICE pages by waiting to allocate chunks on
demand.

The users that would benefit from a general dynamic ZONE_DEVICE facility
are cases like VMs backed by device-dax instances. Unless the VM calls
for bare metal services there is no need to map pages for the device-dax
instance in the hypervisor.

So, the end goal here is to just add some sanity to ZONE_DEVICE page
reference counting to allow for requiring an arbitration for page access
rather than just pfn_to_page() and assuming the page is already there.
Dynamic ZONE_DEVICE becomes something that is possible once that sanity
is in place.

On a prompt from Jason, introduce pgmap_request_folio() that operates on
an offset into a pgmap.
This looks like it would make it fairly easy to request larger (higher
order) folios for physically contiguous device memory allocations in the
future.


   It replaces the shortlived
pgmap_request_folios() that was continuing the layering violation of
assuming pages are available to be consulted before asking the pgmap to
make them available.

For now this only converts the callers to lookup the pgmap and generate
the pgmap offset, but it does not do the deeper cleanup of teaching
those call sites to generate those arguments without walking the page
metadata. For next steps it appears the DEVICE_PRIVATE implementations
could plumb the pgmap into the necessary callsites and switch to using
gen_pool_alloc() to track which offsets of a pgmap are allocated.
Wouldn't that duplicate whatever device memory allocator we already have
in our driver? Couldn't I just take the memory allocation from our TTM
allocator and make necessary pgmap_request_folio calls to allocate the
corresponding pages from the pgmap?
I think you could, as long as the output from that allocator is a
pgmap_offset rather than a pfn.

Or does the pgmap allocation need a finer granularity than the device
memory allocation?
I would say the pgmap *allocation* happens at memremap_pages() time.
pgmap_request_folio() is a request to put a pgmap page into service.

So, yes, I think you can bring your own allocator for what offsets are
in/out of service in pgmap space.

Thank you for the explanation. The patch is

Acked-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>






[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux