The patch titled Subject: memory-failure: fetch compound_head after pgmap_pfn_valid() has been added to the -mm tree. Its filename is memory-failure-fetch-compound_head-after-pgmap_pfn_valid.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/memory-failure-fetch-compound_head-after-pgmap_pfn_valid.patch and later at https://ozlabs.org/~akpm/mmotm/broken-out/memory-failure-fetch-compound_head-after-pgmap_pfn_valid.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Joao Martins <joao.m.martins@xxxxxxxxxx> Subject: memory-failure: fetch compound_head after pgmap_pfn_valid() Patch series "mm, device-dax: Introduce compound pages in devmap", v6. This series converts device-dax to use compound pages, and moves away from the 'struct page per basepage on PMD/PUD' that is done today. Doing so, 1) unlocks a few noticeable improvements on unpin_user_pages() and makes device-dax+altmap case 4x times faster in pinning (numbers below and in last patch) 2) as mentioned in various other threads it's one important step towards cleaning up ZONE_DEVICE refcounting. I've split the compound pages on devmap part from the rest based on recent discussions on devmap pending and future work planned[5][6]. There is consensus that device-dax should be using compound pages to represent its PMD/PUDs just like HugeTLB and THP, and that leads to less specialization of the dax parts. I will pursue the rest of the work in parallel once this part is merged, particular the GUP-{slow,fast} improvements [7] and the tail struct page deduplication memory savings part[8]. To summarize what the series does: Patch 1: Prepare hwpoisoning to work with dax compound pages. Patches 2-3: Split the current utility function of prep_compound_page() into head and tail and use those two helpers where appropriate to take advantage of caches being warm after __init_single_page(). This is used when initializing zone device when we bring up device-dax namespaces. Patches 4-10: Add devmap support for compound pages in device-dax. memmap_init_zone_device() initialize its metadata as compound pages, and it introduces a new devmap property known as vmemmap_shift which outlines how the vmemmap is structured (defaults to base pages as done today). The property describe the page order of the metadata essentially. While at it do a few cleanups in device-dax in patches 5-9. Finally enable device-dax usage of devmap @vmemmap_shift to a value based on its own @align property. @vmemmap_shift returns 0 by default (which is today's case of base pages in devmap, like fsdax or the others) and the usage of compound devmap is optional. Starting with device-dax (*not* fsdax) we enable it by default. There are a few pinning improvements particular on the unpinning case and altmap, as well as unpin_user_page_range_dirty_lock() being just as effective as THP/hugetlb[0] pages. $ gup_test -f /dev/dax1.0 -m 16384 -r 10 -S -a -n 512 -w (pin_user_pages_fast 2M pages) put:~71 ms -> put:~22 ms [altmap] (pin_user_pages_fast 2M pages) get:~524ms put:~525 ms -> get: ~127ms put:~71ms $ gup_test -f /dev/dax1.0 -m 129022 -r 10 -S -a -n 512 -w (pin_user_pages_fast 2M pages) put:~513 ms -> put:~188 ms [altmap with -m 127004] (pin_user_pages_fast 2M pages) get:~4.1 secs put:~4.12 secs -> get:~1sec put:~563ms Tested on x86 with 1Tb+ of pmem (alongside registering it with RDMA with and without altmap), alongside gup_test selftests with dynamic dax regions and static dax regions. Coupled with ndctl unit tests for dynamic dax devices that exercise all of this. Note, for dynamic dax regions I had to revert commit 8aa83e6395 ("x86/setup: Call early_reserve_memory() earlier"), it is a known issue that this commit broke efi_fake_mem=. This patch (of 10): memory_failure_dev_pagemap() at the moment assumes base pages (e.g. dax_lock_page()). For devmap with compound pages fetch the compound_head in case a tail page memory failure is being handled. Currently this is a nop, but in the advent of compound pages in dev_pagemap it allows memory_failure_dev_pagemap() to keep working. Link: https://lkml.kernel.org/r/20211124191005.20783-1-joao.m.martins@xxxxxxxxxx Link: https://lkml.kernel.org/r/20211124191005.20783-2-joao.m.martins@xxxxxxxxxx Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx> Reported-by: Jane Chu <jane.chu@xxxxxxxxxx> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> Reviewed-by: Muchun Song <songmuchun@xxxxxxxxxxxxx> Cc: Vishal Verma <vishal.l.verma@xxxxxxxxx> Cc: Dave Jiang <dave.jiang@xxxxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Cc: Jason Gunthorpe <jgg@xxxxxxxx> Cc: John Hubbard <jhubbard@xxxxxxxxxx> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Cc: Jonathan Corbet <corbet@xxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> Cc: Jason Gunthorpe <jgg@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/memory-failure.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/mm/memory-failure.c~memory-failure-fetch-compound_head-after-pgmap_pfn_valid +++ a/mm/memory-failure.c @@ -1558,6 +1558,12 @@ static int memory_failure_dev_pagemap(un } /* + * Pages instantiated by device-dax (not filesystem-dax) + * may be compound pages. + */ + page = compound_head(page); + + /* * Prevent the inode from being freed while we are interrogating * the address_space, typically this would be handled by * lock_page(), but dax pages do not use the page lock. This _ Patches currently in -mm which might be from joao.m.martins@xxxxxxxxxx are memory-failure-fetch-compound_head-after-pgmap_pfn_valid.patch mm-page_alloc-split-prep_compound_page-into-head-and-tail-subparts.patch mm-page_alloc-refactor-memmap_init_zone_device-page-init.patch mm-memremap-add-zone_device-support-for-compound-pages.patch device-dax-use-align-for-determining-pgoff.patch device-dax-use-struct_size.patch device-dax-ensure-dev_dax-pgmap-is-valid-for-dynamic-devices.patch device-dax-factor-out-page-mapping-initialization.patch device-dax-set-mapping-prior-to-vmf_insert_pfn_pmdpud.patch device-dax-compound-devmap-support.patch