On Thu, Sep 17, 2020 at 10:10:42AM +0200, Oscar Salvador wrote: > This patchset includes some fixups (patch#1,patch#2 and patch#3) > and some cleanups (patch#4-7). > > Patch#1 is a fix to take off HWPoison pages off a buddy freelist since > it can lead us to having HWPoison pages back in the game without no one > noticing it. > So fix it (we did that already for soft_offline_page [1]). > > Patch#2 is fixing a rebasing problem that made the call > to page_handle_poison from _soft_offline_page set the > wrong value for hugepage_or_freepage. [2] > > Patch#3 is not really a fixup, but tries to re-handle a page > in case it was allocated under us. Thanks for the update. This patchset triggers the following BUG_ON() with Aristeu's workload: [ 1010.400900] Soft offlining pfn 0xbff8c at process virtual address 0x7fe6c99c8000 [ 1010.402931] page:00000000f5670686 refcount:1 mapcount:-128 mapping:0000000000000000 index:0x1 pfn:0xbff89 [ 1010.405604] flags: 0xfffe000800000(hwpoison) [ 1010.406755] raw: 000fffe000800000 ffffcddf029ab848 ffffcddf02ff9448 0000000000000000 [ 1010.408824] raw: 0000000000000001 0000000000000000 00000001ffffff7f 0000000000000000 [ 1010.410877] page dumped because: VM_BUG_ON_PAGE(page_count(buddy) != 0) [ 1010.412673] ------------[ cut here ]------------ [ 1010.413930] kernel BUG at mm/page_alloc.c:800! [ 1010.415143] invalid opcode: 0000 [#1] SMP PTI [ 1010.416320] CPU: 3 PID: 1340 Comm: kworker/3:0 Not tainted 5.9.0-rc2-mm1-v5.9-rc2-200917-1952-00212-gf1a0765b04cb+ #33 [ 1010.419101] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 1010.422645] Workqueue: mm_percpu_wq drain_local_pages_wq [ 1010.424075] RIP: 0010:__free_one_page+0x552/0x580 [ 1010.425344] Code: 48 c7 c6 90 6c 0f 84 4c 89 e7 e8 69 7e fd ff 0f 0b 0f 1f 44 00 00 e9 e5 fc ff ff 48 c7 c6 c8 f3 11 84 4c 89 f7 e8 4e 7e fd ff <0f> 0b 83 fb 08 0f 86 cb fc ff ff 48 83 c4 20 5b 5d 41 5c 41 5d 41 [ 1010.430231] RSP: 0018:ffffaa96c171fda0 EFLAGS: 00010082 [ 1010.431651] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 [ 1010.433598] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8dc8bbd18d08 [ 1010.435627] RBP: 00000000000bff88 R08: ffff8dc8bbd18d00 R09: 6573756163656220 [ 1010.437544] R10: 6163656220646570 R11: 6d75642065676170 R12: ffffcddf02ffe200 [ 1010.439376] R13: 00000000000bff89 R14: ffffcddf02ffe240 R15: ffff8dc7bffd5680 [ 1010.441271] FS: 0000000000000000(0000) GS:ffff8dc8bbd00000(0000) knlGS:0000000000000000 [ 1010.443349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1010.444892] CR2: 00007f6b69f92000 CR3: 0000000139c4a000 CR4: 00000000001506e0 [ 1010.446746] Call Trace: [ 1010.447424] free_pcppages_bulk+0x1d4/0x2c0 [ 1010.448553] drain_pages_zone+0x42/0x50 [ 1010.449585] drain_local_pages_wq+0xe/0x10 [ 1010.450702] process_one_work+0x1b0/0x360 [ 1010.451769] worker_thread+0x50/0x3a0 [ 1010.452940] ? process_one_work+0x360/0x360 [ 1010.454072] kthread+0xfe/0x140 [ 1010.454989] ? kthread_park+0x90/0x90 [ 1010.455970] ret_from_fork+0x22/0x30 This message seems to show that the pages to be moved to buddy have refcount. Could you review how changes in v3 -> v4 make it? Here's my reproducer. [build1:~]$ cat test_ksm_madv_soft.c #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <unistd.h> #include <sys/types.h> #include <errno.h> #include <stdlib.h> #define MADV_SOFT_OFFLINE 101 #define err(x) perror(x),exit(EXIT_FAILURE) int main() { int ret; int size = 100000*0x1000; char *p1 = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); printf("p1 %p\n", p1); char *p2 = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); printf("p2 %p\n", p2); ret = madvise(p1, size, MADV_MERGEABLE); printf("madvise(p1) %d\n", ret); ret = madvise(p2, size, MADV_MERGEABLE); printf("madvise(p2) %d\n", ret); printf("writing p1 ... "); memset(p1, 'a', size); printf("done\n"); printf("writing p2 ... "); memset(p2, 'a', size); printf("done\n"); usleep(10000000); printf("soft offline\n"); ret = madvise(p1, size, MADV_SOFT_OFFLINE); printf("soft offline returns %d\n", ret); if (ret) err("madvise"); madvise(p1, size, MADV_UNMERGEABLE); madvise(p2, size, MADV_UNMERGEABLE); printf("OK\n"); } [build1:~/upstream/mm_regression/lib]$ cat tmp_run_ksm_madv.sh rm test_ksm_madv_soft 2> /dev/null gcc -o test_ksm_madv_soft test_ksm_madv_soft.c || exit 1 echo 0 > /sys/kernel/mm/ksm/sleep_millisecs echo 100000 > /sys/kernel/mm/ksm/pages_to_scan echo 100000 > /sys/kernel/mm/ksm/max_page_sharing echo 2 > /sys/kernel/mm/ksm/run echo 1 > /sys/kernel/mm/ksm/run ./test_ksm_madv_soft Thanks, Naoya Horiguchi