On Tue, Jan 14, 2025 at 03:59:31PM +0100, David Hildenbrand wrote: > On 10.01.25 07:00, Alistair Popple wrote: > > Zone device pages are used to represent various type of device memory > > managed by device drivers. Currently compound zone device pages are > > not supported. This is because MEMORY_DEVICE_FS_DAX pages are the only > > user of higher order zone device pages and have their own page > > reference counting. > > > > A future change will unify FS DAX reference counting with normal page > > reference counting rules and remove the special FS DAX reference > > counting. Supporting that requires compound zone device pages. > > > > Supporting compound zone device pages requires compound_head() to > > distinguish between head and tail pages whilst still preserving the > > special struct page fields that are specific to zone device pages. > > > > A tail page is distinguished by having bit zero being set in > > page->compound_head, with the remaining bits pointing to the head > > page. For zone device pages page->compound_head is shared with > > page->pgmap. > > > > The page->pgmap field is common to all pages within a memory section. > > Therefore pgmap is the same for both head and tail pages and can be > > moved into the folio and we can use the standard scheme to find > > compound_head from a tail page. > > The more relevant thing is that the pgmap field must be common to all pages > in a folio, even if a folio exceeds memory sections (e.g., 128 MiB on x86_64 > where we have 1 GiB folios). Thanks for pointing that out. I had assumed folios couldn't cross a memory section. Obviously that is wrong so I've updated the commit message accordingly. - Alistair > > > Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx> > > Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > > > --- > > > > Changes for v4: > > - Fix build breakages reported by kernel test robot > > > > Changes since v2: > > > > - Indentation fix > > - Rename page_dev_pagemap() to page_pgmap() > > - Rename folio _unused field to _unused_pgmap_compound_head > > - s/WARN_ON/VM_WARN_ON_ONCE_PAGE/ > > > > Changes since v1: > > > > - Move pgmap to the folio as suggested by Matthew Wilcox > > --- > > [...] > > > static inline bool folio_is_device_coherent(const struct folio *folio) > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h > > index 29919fa..61899ec 100644 > > --- a/include/linux/migrate.h > > +++ b/include/linux/migrate.h > > @@ -205,8 +205,8 @@ struct migrate_vma { > > unsigned long end; > > /* > > - * Set to the owner value also stored in page->pgmap->owner for > > - * migrating out of device private memory. The flags also need to > > + * Set to the owner value also stored in page_pgmap(page)->owner > > + * for migrating out of device private memory. The flags also need to > > * be set to MIGRATE_VMA_SELECT_DEVICE_PRIVATE. > > * The caller should always set this field when using mmu notifier > > * callbacks to avoid device MMU invalidations for device private > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > index df8f515..54b59b8 100644 > > --- a/include/linux/mm_types.h > > +++ b/include/linux/mm_types.h > > @@ -129,8 +129,11 @@ struct page { > > unsigned long compound_head; /* Bit zero is set */ > > }; > > struct { /* ZONE_DEVICE pages */ > > - /** @pgmap: Points to the hosting device page map. */ > > - struct dev_pagemap *pgmap; > > + /* > > + * The first word is used for compound_head or folio > > + * pgmap > > + */ > > + void *_unused_pgmap_compound_head; > > void *zone_device_data; > > /* > > * ZONE_DEVICE private pages are counted as being > > @@ -299,6 +302,7 @@ typedef struct { > > * @_refcount: Do not access this member directly. Use folio_ref_count() > > * to find how many references there are to this folio. > > * @memcg_data: Memory Control Group data. > > + * @pgmap: Metadata for ZONE_DEVICE mappings > > * @virtual: Virtual address in the kernel direct map. > > * @_last_cpupid: IDs of last CPU and last process that accessed the folio. > > * @_entire_mapcount: Do not use directly, call folio_entire_mapcount(). > > @@ -337,6 +341,7 @@ struct folio { > > /* private: */ > > }; > > /* public: */ > > + struct dev_pagemap *pgmap; > > Agreed, that should work. > > Acked-by: David Hildenbrand <david@xxxxxxxxxx> > > -- > Cheers, > > David / dhildenb >