On Thu, Aug 1, 2024 at 5:15 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 31.07.24 08:49, Chris Li wrote: > > This is the short term solutions "swap cluster order" listed > > in my "Swap Abstraction" discussion slice 8 in the recent > > LSF/MM conference. > > > > Running the cow.c selftest on mm/mm-unstable, I get: Hi David, thanks very much for the test and report! > > # [RUN] Basic COW after fork() with mprotect() optimization ... with swapped-out, PTE-mapped THP (1024 kB) > [ 51.865309] Oops: general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP NOPTI > [ 51.867738] CPU: 21 UID: 0 PID: 282 Comm: kworker/21:1 Not tainted 6.11.0-rc1+ #11 > [ 51.869566] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 > [ 51.871298] Workqueue: events swap_discard_work > [ 51.872211] RIP: 0010:__free_cluster+0x27/0x90 > [ 51.873101] Code: 90 90 90 0f 1f 44 00 00 8b 0d 8d 95 96 01 55 48 89 fd 53 48 89 f3 85 c9 75 3a 48 8b 43 50 48 8b 4b 48 48 8d 53 48 48 83 c5 60 <48> 89 41 08 48 89 08 48 8b 45 08 48 89 55 08 48 89 43 50 48 89 6b > [ 51.876720] RSP: 0018:ffffa3dcc0aafdc8 EFLAGS: 00010286 > [ 51.877752] RAX: dead000000000122 RBX: ffff8e7ed9686e00 RCX: dead000000000100 > [ 51.879186] RDX: ffff8e7ed9686e48 RSI: ffff8e7ed9686e18 RDI: ffff8e7ec37831c0 > [ 51.880577] RBP: ffff8e7ec5d10860 R08: 0000000000000001 R09: 0000000000000028 > [ 51.881972] R10: 0000000000000200 R11: 00000000000004cb R12: ffff8e7ed9686e00 > [ 51.883393] R13: 0000000000028200 R14: 0000000000028000 R15: 0000000000000000 > [ 51.884827] FS: 0000000000000000(0000) GS:ffff8e822f480000(0000) knlGS:0000000000000000 > [ 51.886412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 51.887532] CR2: 00007f37d7e17840 CR3: 0000000335a3a001 CR4: 0000000000770ef0 > [ 51.888931] PKRU: 55555554 > [ 51.889471] Call Trace: > [ 51.889964] <TASK> > [ 51.890391] ? __die_body.cold+0x19/0x27 > [ 51.891174] ? die_addr+0x3c/0x60 > [ 51.891824] ? exc_general_protection+0x14f/0x430 > [ 51.892754] ? asm_exc_general_protection+0x26/0x30 > [ 51.893717] ? __free_cluster+0x27/0x90 > [ 51.894483] ? __free_cluster+0x7e/0x90 > [ 51.895245] swap_do_scheduled_discard+0x142/0x1b0 > [ 51.896189] swap_discard_work+0x26/0x30 > [ 51.896958] process_one_work+0x211/0x5a0 > [ 51.897750] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 51.898693] worker_thread+0x1c9/0x3c0 > [ 51.899438] ? __pfx_worker_thread+0x10/0x10 > [ 51.900287] kthread+0xe3/0x110 > [ 51.900913] ? __pfx_kthread+0x10/0x10 > [ 51.901656] ret_from_fork+0x34/0x50 > [ 51.902377] ? __pfx_kthread+0x10/0x10 > [ 51.903114] ret_from_fork_asm+0x1a/0x30 > [ 51.903896] </TASK> > > > Maybe related to this series? Right, I can reproduce your problem and I believe this patch can fix it, see the attachment. Hi Andrew, can you pick this patch too?
Attachment:
0001-SQUASH-Fix-discard-of-full-cluster.patch
Description: Binary data