On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev <kasan-dev@xxxxxxxxxxxxxxxx> wrote: > > The issue is easy to reproduced with large vmalloc, kindly ping... > > On 2023/9/15 8:58, Kefeng Wang wrote: > > Hi All, any suggest or comments,many thanks. > > > > On 2023/9/6 20:42, Kefeng Wang wrote: > >> This is a RFC, even patch3 is a hack to fix the softlock issue when > >> populate or depopulate pte with large region, looking forward to your > >> reply and advise, thanks. > > > > Here is full stack,for populate pte, > > > > [ C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458] > > [ C3] Modules linked in: test(OE+) > > [ C3] irq event stamp: 320776 > > [ C3] hardirqs last enabled at (320775): [<ffff8000815a0c98>] > > _raw_spin_unlock_irqrestore+0x98/0xb8 > > [ C3] hardirqs last disabled at (320776): [<ffff8000815816e0>] > > el1_interrupt+0x38/0xa8 > > [ C3] softirqs last enabled at (318174): [<ffff800080040ba8>] > > __do_softirq+0x658/0x7ac > > [ C3] softirqs last disabled at (318169): [<ffff800080047fd8>] > > ____do_softirq+0x18/0x30 > > [ C3] CPU: 3 PID: 458 Comm: insmod Tainted: G OE 6.5.0+ #595 > > [ C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 > > [ C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > [ C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8 > > [ C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8 > > [ C3] sp : ffff800093386d70 > > [ C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0 > > [ C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708 > > [ C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000 > > [ C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000 > > [ C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60 > > [ C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9 > > [ C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8 > > [ C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000 > > [ C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70 > > [ C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507 > > [ C3] Call trace: > > [ C3] _raw_spin_unlock_irqrestore+0x50/0xb8 > > [ C3] rmqueue_bulk+0x434/0x6b8 > > [ C3] get_page_from_freelist+0xdd4/0x1680 > > [ C3] __alloc_pages+0x244/0x508 > > [ C3] alloc_pages+0xf0/0x218 > > [ C3] __get_free_pages+0x1c/0x50 > > [ C3] kasan_populate_vmalloc_pte+0x30/0x188 > > [ C3] __apply_to_page_range+0x3ec/0x650 > > [ C3] apply_to_page_range+0x1c/0x30 > > [ C3] kasan_populate_vmalloc+0x60/0x70 > > [ C3] alloc_vmap_area.part.67+0x328/0xe50 > > [ C3] alloc_vmap_area+0x4c/0x78 > > [ C3] __get_vm_area_node.constprop.76+0x130/0x240 > > [ C3] __vmalloc_node_range+0x12c/0x340 > > [ C3] __vmalloc_node+0x8c/0xb0 > > [ C3] vmalloc+0x2c/0x40 > > [ C3] show_mem_init+0x1c/0xff8 [test] > > [ C3] do_one_initcall+0xe4/0x500 > > [ C3] do_init_module+0x100/0x358 > > [ C3] load_module+0x2e64/0x2fc8 > > [ C3] init_module_from_file+0xec/0x148 > > [ C3] idempotent_init_module+0x278/0x380 > > [ C3] __arm64_sys_finit_module+0x88/0xf8 > > [ C3] invoke_syscall+0x64/0x188 > > [ C3] el0_svc_common.constprop.1+0xec/0x198 > > [ C3] do_el0_svc+0x48/0xc8 > > [ C3] el0_svc+0x3c/0xe8 > > [ C3] el0t_64_sync_handler+0xa0/0xc8 > > [ C3] el0t_64_sync+0x188/0x190 > > > > and for depopuldate pte, > > > > [ C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59] > > [ C6] Modules linked in: test(OE+) > > [ C6] irq event stamp: 39458 > > [ C6] hardirqs last enabled at (39457): [<ffff8000815a0c98>] > > _raw_spin_unlock_irqrestore+0x98/0xb8 > > [ C6] hardirqs last disabled at (39458): [<ffff8000815816e0>] > > el1_interrupt+0x38/0xa8 > > [ C6] softirqs last enabled at (39420): [<ffff800080040ba8>] > > __do_softirq+0x658/0x7ac > > [ C6] softirqs last disabled at (39415): [<ffff800080047fd8>] > > ____do_softirq+0x18/0x30 > > [ C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G OEL > > 6.5.0+ #595 > > [ C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 > > [ C6] Workqueue: events drain_vmap_area_work > > [ C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > [ C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8 > > [ C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8 > > [ C6] sp : ffff80008fe676b0 > > [ C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80 > > [ C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006 > > [ C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006 > > [ C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000 > > [ C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0 > > [ C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9 > > [ C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8 > > [ C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000 > > [ C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98 > > [ C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21 > > [ C6] Call trace: > > [ C6] _raw_spin_unlock_irqrestore+0x50/0xb8 > > [ C6] free_pcppages_bulk+0x2bc/0x3e0 > > [ C6] free_unref_page_commit+0x1fc/0x290 > > [ C6] free_unref_page+0x184/0x250 > > [ C6] __free_pages+0x154/0x1a0 > > [ C6] free_pages+0x88/0xb0 > > [ C6] kasan_depopulate_vmalloc_pte+0x58/0x80 > > [ C6] __apply_to_page_range+0x3ec/0x650 > > [ C6] apply_to_existing_page_range+0x1c/0x30 > > [ C6] kasan_release_vmalloc+0xa4/0x118 > > [ C6] __purge_vmap_area_lazy+0x4f4/0xe30 > > [ C6] drain_vmap_area_work+0x60/0xc0 > > [ C6] process_one_work+0x4cc/0xa38 > > [ C6] worker_thread+0x240/0x638 > > [ C6] kthread+0x1c8/0x1e0 > > [ C6] ret_from_fork+0x10/0x20 > > > > > > > >> > >> Kefeng Wang (3): > >> mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte() > >> mm: kasan: shadow: move free_page() out of page table lock > >> mm: kasan: shadow: HACK add cond_resched_lock() in > >> kasan_depopulate_vmalloc_pte() The first 2 patches look ok, but yeah, the last is a hack. I also don't have any better suggestions, only more questions. Does this only happen on arm64? Do you have a minimal reproducer you can share?