On Fri, Sep 03, 2021 at 11:31:01AM +0200, David Hildenbrand wrote: > On 03.09.21 00:21, George G. Davis wrote: > > On Tue, Aug 31, 2021 at 04:53:31PM +0200, David Hildenbrand wrote: > > > On 23.08.21 22:28, George G. Davis wrote: > > > > From: "George G. Davis" <davis.george@xxxxxxxxxxx> > > > > > > > > Some test_pages_isolated failure conditions don't include trace points. > > > > For debugging issues caused by "pinned" pages, make sure to trace all > > > > calls whether they succeed or fail. In this case, a failure case did not > > > > result in a trace point. So add the missing failure case in > > > > test_pages_isolated traces. > > > > > > In which setups did you actually run into these cases? > > > > Good question! > > > > Although I'm not 100% certain that this specific failure condition has > > occurred in my recent testing, I'm able to reproduce cma_alloc -EBUSY > > faiure conditions when testing latest/recent master on arm64 based > > Renesas R-Car Starter Kit [1] using defconfig with > > CONFIG_CMA_SIZE_MBYTES=384 while running the following test case: > > Okay, I think you are not hitting the path you touched in this patch, > because I assume it will never ever really trigger ... I didn't have any confidence that this path was triggered but if for some reason it does occur there would be no trace point for this case, hence this patch. > > trace-cmd record -N 192.168.1.87:12345 -b 4096 -e cma -e page_isolation -e compaction -e migrate & > > sleep 10 > > while true; do a=$(( ( RANDOM % 10000 ) + 1 )); echo $a > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $a; echo $a > /sys/kernel/debug/cma/cma-reserved/free); done & > > while true; do b=$(( ( RANDOM % 10000 ) + 1 )); echo $b > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $b; echo $b > /sys/kernel/debug/cma/cma-reserved/free); done & > > while true; do c=$(( ( RANDOM % 10000 ) + 1 )); echo $c > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $c; echo $c > /sys/kernel/debug/cma/cma-reserved/free); done & > > while true; do d=$(( ( RANDOM % 10000 ) + 1 )); echo $d > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $d; echo $d > /sys/kernel/debug/cma/cma-reserved/free); done & > > while true; do e=$(( ( RANDOM % 10000 ) + 1 )); echo $e > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $e; echo $e > /sys/kernel/debug/cma/cma-reserved/free); done & > > /selftests/vm/transhuge-stress & > > > > The cma_alloc -EBUSY failures are caused by THP compound pages allocated > > from the CMA region where migration does not seem to work for compound > > THP pages. The work around is to disable CONFIG_TRANSPARENT_HUGEPAGE > > since it seems incompatible with the intended use of the CMA region. > > > Oh, that sounds broken, THP should not block CMA allocation or page > migration for other purposes. Yup, CMA+THP is not well behaved based on my observations. > a) Are these temporary or permanent allocation errors? If they are > permanent, they will also break memory unplug. It's difficult for me to understand but, with encouragement from Eugeniu Rosca, I was able to create a simplified test case which I believe demonstrates the problem that we're also observing during normal system operation w/o the use of synthetic, possibly bogus, test cases. See below for the possibly bogus synthetic test case... > b) Did you reproduce on other architectures as well? I've only tested arm64 based Renesas R-Car Starter Kit on v4.14 and master based kernels where the same problem is easily reproducible. > c) Did it use to work but is now broken? IOW, did you try bisecting? I have not identified any versions where this is/was known to work but there is a lot of anecdotal evidence that problems such as this have been an ongoing concern, e.g. [1][2][3]. [1] https://lwn.net/Articles/636234/ [2] https://lwn.net/Articles/619865/ [3] https://lwn.net/Articles/684611/ To demonstrate the cma_alloc -EBUSY failure case for THP compound pages, test_pages_isolated and transhuge-stress are modified as follows: diff --git a/mm/page_isolation.c b/mm/page_isolation.c index fff55bb830f9..c38607dcc304 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -316,5 +316,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, out: trace_test_pages_isolated(start_pfn, end_pfn, pfn); + if (ret) + dump_page(pfn_to_page(pfn), "test_pages_isolated: -EBUSY"); + return ret; } diff --git a/tools/testing/selftests/vm/transhuge-stress.c b/tools/testing/selftests/vm/transhuge-stress.c index 5e4c036f6ad3..4c5bf35fafb4 100644 --- a/tools/testing/selftests/vm/transhuge-stress.c +++ b/tools/testing/selftests/vm/transhuge-stress.c @@ -113,6 +113,9 @@ int main(int argc, char **argv) if (pfn < 0) { nr_failed++; + } else if (pfn == 516096) { + warnx("Got pfn==0x%lx, please proceed with testing of ~64MiB allocation at end of default CMA region now. Meanwhile, we will wait here in a loop to hold the THP mapping in CMA default region", pfn); + do { } while (1); } else { size_t idx = pfn >> (HPAGE_SHIFT - PAGE_SHIFT); With the above changes, the goal of the test case is to leave a large ~320MiB region free at the start of the CMA default region, which can be allocated for CMA use via a large ~320MiB allocation, then execute transhuge-stress to allocate THP mappings at the halfway mark within the final ~64MiB region of the default CMA region, followed by an attempt to allocate a ~64MiB CMA allocation that is expected to fail due to the THP mapping, e.g.: root@rcar-gen3:~# cat /sys/kernel/debug/cma/cma-reserved/base_pfn 425984 root@rcar-gen3:~# cat /sys/kernel/debug/cma/cma-reserved/maxchunk 97182 root@rcar-gen3:~# echo 81920 > /sys/kernel/debug/cma/cma-reserved/alloc root@rcar-gen3:~# /selftests/vm/transhuge-stress transhuge-stress: allocate 1874 transhuge pages, using 3749 MiB virtual memory and 14 MiB of ram transhuge-stress: Got pfn==0x7e000, please proceed with testing of ~64MiB allocation at end of default CMA region now. Meanwhile, we will wait here in a loop to hold the THP mapping in CMA default region ^Z [2]+ Stopped(SIGTSTP) /selftests/vm/transhuge-stress root@rcar-gen3:~# bg [2] /selftests/vm/transhuge-stress & root@rcar-gen3:~# cat /sys/kernel/debug/cma/cma-reserved/maxchunk 15262 root@rcar-gen3:~# echo 15262 > /sys/kernel/debug/cma/cma-reserved/alloc [ 365.002327] page:fffffc0000f18000 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff6ba00 pfn:0x7c600 [ 365.012865] head:fffffc0000f18000 order:9 compound_mapcount:0 compound_pincount:0 [ 365.020707] memcg:ffff0004cc8a2000 [ 365.024380] anon flags: 0x3fffc000009001c(uptodate|dirty|lru|head|swapbacked|node=0|zone=0|lastcpupid=0xffff) [ 365.034741] raw: 03fffc000009001c fffffc0000828008 fffffc0000830008 ffff0004c8e33001 [ 365.042703] raw: 0000000ffff6ba00 0000000000000000 00000001ffffffff ffff0004cc8a2000 [ 365.050708] page dumped because: test_pages_isolated: -EBUSY [ 365.056554] page_owner tracks the page as allocated [ 365.061649] page last allocated via order 9, migratetype Movable, gfp_mask 0x13d24ca(GFP_TRANSHUGE|__GFP_NORETRY|__GFP_THISNODE), pid 783, ts 344253286264, free_ts 330526665485 [ 365.077846] post_alloc_hook+0x14c/0x218 [ 365.081941] prep_new_page+0x28/0x68 [ 365.085595] get_page_from_freelist+0x1ec8/0x2560 [ 365.090370] __alloc_pages+0x270/0x388 [ 365.094186] alloc_pages_vma+0x1d4/0x278 [ 365.098176] do_huge_pmd_anonymous_page+0x124/0xb50 [ 365.103120] __handle_mm_fault+0x5d8/0xdd8 [ 365.107285] handle_mm_fault+0xc8/0x200 [ 365.111187] do_page_fault+0x1f4/0x4c8 [ 365.115005] do_translation_fault+0x90/0xc0 [ 365.119254] do_mem_abort+0x40/0xb0 [ 365.122808] el0_da+0x70/0x200 [ 365.125881] el0t_64_sync_handler+0x68/0xb8 [ 365.130124] el0t_64_sync+0x180/0x184 [ 365.133849] page last free stack trace: [ 365.137753] free_pcp_prepare+0x200/0x4c8 [ 365.141853] free_unref_page+0x28/0x200 [ 365.145777] __free_pages+0x114/0x158 [ 365.149526] free_contig_range+0x68/0xc0 [ 365.153534] alloc_contig_range+0x368/0x418 [ 365.157803] cma_alloc+0x270/0x868 [ 365.161297] cma_alloc_write+0x50/0x98 [ 365.165131] simple_attr_write+0x10c/0x1c8 [ 365.169315] debugfs_attr_write+0x50/0x78 [ 365.173414] full_proxy_write+0x64/0x98 [ 365.177336] vfs_write+0xc8/0x570 [ 365.180733] ksys_write+0x6c/0x100 [ 365.184229] __arm64_sys_write+0x1c/0x28 [ 365.188273] invoke_syscall+0x44/0x100 [ 365.192124] el0_svc_common+0x100/0x120 [ 365.196048] do_el0_svc+0x28/0x90 [ 365.200114] cma: cma_alloc: reserved: alloc failed, req-size: 15262 pages, ret: -16 -sh: echo: write error: Cannot allocate memory root@rcar-gen3:~# fg /selftests/vm/transhuge-stress ^C root@rcar-gen3:~# echo 15262 > /sys/kernel/debug/cma/cma-reserved/alloc root@rcar-gen3:~# echo 15262 > /sys/kernel/debug/cma/cma-reserved/free root@rcar-gen3:~# echo 81920 > /sys/kernel/debug/cma/cma-reserved/free In the above test results, the second CMA allocation fails due to the presence of THP mappings within the final ~64MiB of the default CMA memory region. This may not be an entirely realistic test case but I believe that it demonstrates the same failure case that we've observed under normal runtime conditions where we observe the same cma_alloc -EBUSY failure conditions when THP is enabled. I have a rather ugly test_pages_isolated trace point patch that emulates dump_page support where it has shown the same page struct information as the above synthetic test case results during normal runtime conditions where the same problem has been observed, i.e. THP compound page mappings are blocking CMA allocations, which is rather unfortunate bad behavior. Thanks! > > -- > Thanks, > > David / dhildenb > -- Regards, George