All, Some progress updates When issue occurs, calling __drain_all_pages() can make offline_pages() escape from the loop. > > Jun 28 15:29:26 linux kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7980dd > Jun 28 15:29:26 linux kernel: flags: 0x9fffffc0000000(node=2|zone=3|lastcpupid=0x1fffff) > Jun 28 15:29:26 linux kernel: raw: 009fffffc0000000 ffffdfbd9e603788 ffffd4f0ffd97ef0 0000000000000000 > Jun 28 15:29:26 linux kernel: raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 > Jun 28 15:29:26 linux kernel: page dumped because: trouble page... > With this problematic page structure contents, it seems that the list_head = {ffffdfbd9e603788, ffffd4f0ffd97ef0} is valid. I guess it was linking to the pcp_list, so I dumped the per_cpu_pages[cpu].count in every in critical timings. An example is as below, offline_pages() { // per_cpu_pages[1].count = 0 zone_pcp_disable() // will call __drain_all_pages() // per_cpu_pages[1].count = 188 do { do { scan_movable_pages() ret = do_migrate_range() } while (!ret) ret = test_pages_isolated() if(is the 1st iteration) // per_cpu_pages[1].count = 182 if (issue occurs) { /* if the loop take beyond 10 seconds */ // per_cpu_pages[1].count = 61 __drain_all_pages() // per_cpu_pages[1].count = 0 /* will escape from the outer loop in later iterations */ } } while (ret) } Some interesting points: - After the 1st __drain_all_pages(), per_cpu_pages[1].count increased to 188 from 0, does it mean it's racing with something...? - per_cpu_pages[1].count will decrease but not decrease to 0 during iterations - when issue occurs, calling __drain_all_pages() will decrease per_cpu_pages[1].count to 0. So I wonder if it's fine to call __drain_all_pages() in the loop? Looking forward to your insights. Thanks Zhijian On 01/07/2024 09:25, Zhijian Li (Fujitsu) wrote: > Hi all > > > Overview: > During testing the CXL memory hotremove, we noticed that `daxctl offline-memory dax0.0` > would get stuck forever sometimes. daxctl offline-memory dax0.0 will write "offline" to > /sys/devices/system/memory/memoryNNN/state. > > Workaround: > When it happens, we can type Ctrl-C to abort it and then retry again. > Then the CXL memory is able to offline successfully. > > Where the kernel gets stuck: > After digging into the kernel, we found that when the issue occurs, the kernel > is stuck in the outer loop of offline_pages(). Below is a piece of the > highlighted offline_pages(): > > ``` > int __ref offline_pages() > { > do { // outer loop > pfn = start_pfn; > do { > ret = scan_movable_pages(pfn, end_pfn, &pfn); // It returns -ENOENT > if (!ret) > do_migrate_range(pfn, end_pfn); // Not reach here > } while (!ret); > ret = test_pages_isolated(start_pfn, end_pfn, MEMORY_OFFLINE); > } while (ret); // ret is -EBUSY > } > ``` > > In this case, we dumped the first page that cannot be isolated (see dump_page below), it's > content does not change in each iteration.: > ``` > Jun 28 15:29:26 linux kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7980dd > Jun 28 15:29:26 linux kernel: flags: 0x9fffffc0000000(node=2|zone=3|lastcpupid=0x1fffff) > Jun 28 15:29:26 linux kernel: raw: 009fffffc0000000 ffffdfbd9e603788 ffffd4f0ffd97ef0 0000000000000000 > Jun 28 15:29:26 linux kernel: raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 > Jun 28 15:29:26 linux kernel: page dumped because: trouble page... > ``` > > Every time the issue occurs, the content of the page structure is similar. > > Questions: > Q1. Is this behavior expected? At least for an OS administrator, it should return > promptly (success or failure) instead of hanging indefinitely. > Q2. Regarding the offline_pages() function, encountering such a page indeed causes > an endless loop. Shouldn't another part of the kernel timely changed the state > of this page? > > When I use the workaround mentioned above (Ctrl-C and try offline again), I find > that the page state changes (see dump_page below): > ``` > Jun 28 15:33:12 linux kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7980dd > Jun 28 15:33:12 linux kernel: flags: 0x9fffffc0000000(node=2|zone=3|lastcpupid=0x1fffff) > Jun 28 15:33:12 linux kernel: raw: 009fffffc0000000 dead000000000100 dead000000000122 0000000000000000 > Jun 28 15:33:12 linux kernel: raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 > Jun 28 15:33:12 linux kernel: page dumped because: previous trouble page > ``` > > What our test does: > We have a CXL memory device, which is configured as kmem and online into the MOVABLE > zone as NUMA node2. We run two processes, consume-memory and offline-memory, in parallel, > see the pseudo code below: > > ``` > main() > { > if (fork() == 0) > numactl -m 2 ./consume-memory > else { > daxctl offline-memory dax0.0 > wait() > } > } > ``` > > Attached is the process information (when it gets stuck): > ``` > root 25716 0.0 0.0 2460 1408 pts/0 S+ 15:28 0:00 ./main > root 25719 0.0 0.0 0 0 pts/0 Z+ 15:28 0:00 [consume-memory] <defunct> > root 25720 98.6 0.0 9476 3740 pts/0 R+ 15:28 0:26 daxctl offline-memory /dev/dax0.0 > ``` > > Feel free to let me know if you need more details. > Thank you for your attention to this issue. Looking forward to your insights. > > Thanks > Zhijian