On Mon, 9 May 2022, Mel Gorman wrote: > Currently the PCP lists are protected by using local_lock_irqsave to > prevent migration and IRQ reentrancy but this is inconvenient. Remote > draining of the lists is impossible and a workqueue is required and > every task allocation/free must disable then enable interrupts which is > expensive. > > As preparation for dealing with both of those problems, protect the > lists with a spinlock. The IRQ-unsafe version of the lock is used > because IRQs are already disabled by local_lock_irqsave. spin_trylock > is used in preparation for a time when local_lock could be used instead > of lock_lock_irqsave. 8c580f60a145 ("mm/page_alloc: protect PCP lists with a spinlock") in next-20220520: I haven't looked up whether that comes from a stable or unstable suburb of akpm's tree. Mel, the VM_BUG_ON(in_hardirq()) which this adds to free_unref_page_list() is not valid. I have no appreciation of how important it is to the whole scheme, but as it stands, it crashes; and when I change it to a warning --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3475,7 +3475,7 @@ void free_unref_page_list(struct list_he if (list_empty(list)) return; - VM_BUG_ON(in_hardirq()); + WARN_ON_ONCE(in_hardirq()); local_lock_irqsave(&pagesets.lock, flags); then everything *appears* to go on working correctly after the splat below (from which you will infer that I'm swapping to nvme): [ 256.167040] WARNING: CPU: 0 PID: 9842 at mm/page_alloc.c:3478 free_unref_page_list+0x92/0x343 [ 256.170031] CPU: 0 PID: 9842 Comm: cc1 Not tainted 5.18.0-rc7-n20 #3 [ 256.171285] Hardware name: LENOVO 20HQS0EG02/20HQS0EG02, BIOS N1MET54W (1.39 ) 04/16/2019 [ 256.172555] RIP: 0010:free_unref_page_list+0x92/0x343 [ 256.173820] Code: ff ff 49 8b 44 24 08 4d 89 e0 4c 8d 60 f8 eb b6 48 8b 03 48 39 c3 0f 84 af 02 00 00 65 8b 05 72 7f df 7e a9 00 00 0f 00 74 02 <0f> 0b 9c 41 5d fa 41 0f ba e5 09 73 05 e8 1f 0a f9 ff e8 46 90 7b [ 256.175289] RSP: 0018:ffff88803ec07c80 EFLAGS: 00010006 [ 256.176683] RAX: 0000000080010000 RBX: ffff88803ec07cf8 RCX: 000000000000002c [ 256.178122] RDX: 0000000000000000 RSI: ffff88803ec29d28 RDI: 0000000000000040 [ 256.179580] RBP: ffff88803ec07cc0 R08: ffff88803ec07cf0 R09: 00000000000a401d [ 256.181031] R10: 0000000000000000 R11: ffff8880101891b8 R12: ffff88803f6dd600 [ 256.182501] R13: ffff88803ec07cf8 R14: 000000000000000f R15: 0000000000000000 [ 256.183957] FS: 00007ffff7fcfac0(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 256.185419] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 256.186911] CR2: 0000555555710cdc CR3: 00000000240b4004 CR4: 00000000003706f0 [ 256.188395] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 256.189888] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 256.191390] Call Trace: [ 256.192844] <IRQ> [ 256.194253] ? __mem_cgroup_uncharge_list+0x4e/0x57 [ 256.195715] release_pages+0x26f/0x27e [ 256.197150] ? list_add_tail+0x39/0x39 [ 256.198603] pagevec_lru_move_fn+0x95/0xa4 [ 256.200065] folio_rotate_reclaimable+0xa0/0xd1 [ 256.201545] folio_end_writeback+0x1c/0x78 [ 256.203064] end_page_writeback+0x11/0x13 [ 256.204495] end_swap_bio_write+0x87/0x95 [ 256.205967] bio_endio+0x15e/0x162 [ 256.207393] blk_mq_end_request_batch+0xd2/0x18d [ 256.208932] ? __this_cpu_preempt_check+0x13/0x15 [ 256.210378] ? lock_is_held_type+0xcf/0x10f [ 256.211714] ? lock_is_held+0xc/0xe [ 256.213065] ? rcu_read_lock_sched_held+0x24/0x4f [ 256.214450] nvme_pci_complete_batch+0x4c/0x51 [ 256.215811] nvme_irq+0x43/0x4e [ 256.217173] ? nvme_unmap_data+0xb5/0xb5 [ 256.218633] __handle_irq_event_percpu+0xff/0x235 [ 256.220062] handle_irq_event_percpu+0x10/0x39 [ 256.221584] handle_irq_event+0x34/0x53 [ 256.223061] handle_edge_irq+0xb1/0xd5 [ 256.224486] __common_interrupt+0x7a/0xe6 [ 256.225918] common_interrupt+0x9c/0xca [ 256.227330] </IRQ> [ 256.228763] <TASK> [ 256.230226] asm_common_interrupt+0x2c/0x40 [ 256.231724] RIP: 0010:lock_acquire.part.0+0x1a9/0x1b4 [ 256.233190] Code: df ec 7e ff c8 74 19 0f 0b 48 c7 c7 77 2c 4e 82 e8 d7 e7 88 00 65 c7 05 01 df ec 7e 00 00 00 00 48 85 db 74 01 fb 48 8d 65 d8 <5b> 41 5c 41 5d 41 5e 41 5f 5d c3 55 48 89 e5 41 57 4d 89 cf 41 56 [ 256.234804] RSP: 0018:ffff888010e939f0 EFLAGS: 00000206 [ 256.236339] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 3e4406066e4f4abc [ 256.237889] RDX: 0000000000000000 RSI: ffffffff824021e7 RDI: ffffffff8244b3d1 [ 256.239401] RBP: ffff888010e93a18 R08: 0000000000000028 R09: 000000000002001d [ 256.240994] R10: 0000000000000000 R11: ffff8880101891b8 R12: 0000000000000002 [ 256.242545] R13: ffffffff82757a80 R14: ffffffff81253954 R15: 0000000000000000 [ 256.244085] ? __folio_memcg_unlock+0x48/0x48 [ 256.245597] lock_acquire+0xfa/0x10a [ 256.247179] ? __folio_memcg_unlock+0x48/0x48 [ 256.248727] rcu_lock_acquire.constprop.0+0x24/0x27 [ 256.250293] ? __folio_memcg_unlock+0x48/0x48 [ 256.251736] mem_cgroup_iter+0x3d/0x178 [ 256.253245] shrink_node_memcgs+0x169/0x182 [ 256.254822] shrink_node+0x220/0x3d9 [ 256.256372] shrink_zones+0x10f/0x1ca [ 256.257923] ? __this_cpu_preempt_check+0x13/0x15 [ 256.259437] do_try_to_free_pages+0x7a/0x192 [ 256.260947] try_to_free_mem_cgroup_pages+0x14b/0x213 [ 256.262405] try_charge_memcg+0x230/0x433 [ 256.263865] try_charge+0x12/0x17 [ 256.265236] charge_memcg+0x25/0x7c [ 256.266615] __mem_cgroup_charge+0x28/0x3d [ 256.267962] mem_cgroup_charge.constprop.0+0x1d/0x1f [ 256.269290] do_anonymous_page+0x118/0x20c [ 256.270712] handle_pte_fault+0x151/0x15f [ 256.272015] __handle_mm_fault+0x39d/0x3ac [ 256.273198] handle_mm_fault+0xc2/0x188 [ 256.274371] do_user_addr_fault+0x240/0x39d [ 256.275595] exc_page_fault+0x1e1/0x204 [ 256.276787] asm_exc_page_fault+0x2c/0x40 [ 256.277906] RIP: 0033:0xd67250 [ 256.279017] Code: 02 f6 ff 49 89 c0 48 83 fb 08 73 1e f6 c3 04 75 39 48 85 db 74 2d c6 00 00 f6 c3 02 74 25 31 c0 66 41 89 44 18 fe eb 1b 66 90 <48> c7 44 18 f8 00 00 00 00 48 8d 4b ff 31 c0 4c 89 c7 48 c1 e9 03 [ 256.280355] RSP: 002b:00007fffffffc360 EFLAGS: 00010206 [ 256.281678] RAX: 00007ffff5972000 RBX: 00000000000000a8 RCX: 000000000000000c [ 256.282955] RDX: 0000000000000006 RSI: 0000000000000017 RDI: 0000000000000986 [ 256.284264] RBP: 0000000000000002 R08: 00007ffff5972000 R09: 0000000000000987 [ 256.285554] R10: 0000000000000001 R11: 0000000001000001 R12: 00007ffff5969c80 [ 256.286930] R13: 0000000000000015 R14: 0000000000000000 R15: 0000000001eb4760 [ 256.288300] </TASK> [ 256.289533] irq event stamp: 95044 [ 256.290776] hardirqs last enabled at (95043): [<ffffffff819ddf43>] irqentry_exit+0x67/0x75 [ 256.292147] hardirqs last disabled at (95044): [<ffffffff819db09c>] common_interrupt+0x1a/0xca [ 256.293479] softirqs last enabled at (94982): [<ffffffff81c0036f>] __do_softirq+0x36f/0x3aa [ 256.294754] softirqs last disabled at (94977): [<ffffffff81105c01>] __irq_exit_rcu+0x85/0xc1