Re: [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte

Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> · Thu, 19 Oct 2023 15:26:48 +0800

On 2023/10/19 14:17, Uladzislau Rezki wrote:
On Thu, Oct 19, 2023 at 09:40:10AM +0800, Kefeng Wang wrote:

On 2023/10/19 0:37, Marco Elver wrote:
On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
<kasan-dev@xxxxxxxxxxxxxxxx> wrote:

The issue is easy to reproduced with large vmalloc, kindly ping...

On 2023/9/15 8:58, Kefeng Wang wrote:
Hi All， any suggest or comments，many thanks.

On 2023/9/6 20:42, Kefeng Wang wrote:
This is a RFC, even patch3 is a hack to fix the softlock issue when
populate or depopulate pte with large region, looking forward to your
reply and advise, thanks.

Here is full stack，for populate pte，

[    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
[    C3] Modules linked in: test(OE+)
[    C3] irq event stamp: 320776
[    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
_raw_spin_unlock_irqrestore+0x98/0xb8
[    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
el1_interrupt+0x38/0xa8
[    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
__do_softirq+0x658/0x7ac
[    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
____do_softirq+0x18/0x30
[    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
[    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
[    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
[    C3] sp : ffff800093386d70
[    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
[    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
[    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
[    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
[    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
[    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
[    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
[    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
[    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
[    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
[    C3] Call trace:
[    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
[    C3]  rmqueue_bulk+0x434/0x6b8
[    C3]  get_page_from_freelist+0xdd4/0x1680
[    C3]  __alloc_pages+0x244/0x508
[    C3]  alloc_pages+0xf0/0x218
[    C3]  __get_free_pages+0x1c/0x50
[    C3]  kasan_populate_vmalloc_pte+0x30/0x188
[    C3]  __apply_to_page_range+0x3ec/0x650
[    C3]  apply_to_page_range+0x1c/0x30
[    C3]  kasan_populate_vmalloc+0x60/0x70
[    C3]  alloc_vmap_area.part.67+0x328/0xe50
[    C3]  alloc_vmap_area+0x4c/0x78
[    C3]  __get_vm_area_node.constprop.76+0x130/0x240
[    C3]  __vmalloc_node_range+0x12c/0x340
[    C3]  __vmalloc_node+0x8c/0xb0
[    C3]  vmalloc+0x2c/0x40
[    C3]  show_mem_init+0x1c/0xff8 [test]
[    C3]  do_one_initcall+0xe4/0x500
[    C3]  do_init_module+0x100/0x358
[    C3]  load_module+0x2e64/0x2fc8
[    C3]  init_module_from_file+0xec/0x148
[    C3]  idempotent_init_module+0x278/0x380
[    C3]  __arm64_sys_finit_module+0x88/0xf8
[    C3]  invoke_syscall+0x64/0x188
[    C3]  el0_svc_common.constprop.1+0xec/0x198
[    C3]  do_el0_svc+0x48/0xc8
[    C3]  el0_svc+0x3c/0xe8
[    C3]  el0t_64_sync_handler+0xa0/0xc8
[    C3]  el0t_64_sync+0x188/0x190

and for depopuldate pte，

[    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
[    C6] Modules linked in: test(OE+)
[    C6] irq event stamp: 39458
[    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
_raw_spin_unlock_irqrestore+0x98/0xb8
[    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
el1_interrupt+0x38/0xa8
[    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
__do_softirq+0x658/0x7ac
[    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
____do_softirq+0x18/0x30
[    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
6.5.0+ #595
[    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[    C6] Workqueue: events drain_vmap_area_work
[    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
[    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
[    C6] sp : ffff80008fe676b0
[    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
[    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
[    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
[    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
[    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
[    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
[    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
[    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
[    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
[    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
[    C6] Call trace:
[    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
[    C6]  free_pcppages_bulk+0x2bc/0x3e0
[    C6]  free_unref_page_commit+0x1fc/0x290
[    C6]  free_unref_page+0x184/0x250
[    C6]  __free_pages+0x154/0x1a0
[    C6]  free_pages+0x88/0xb0
[    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
[    C6]  __apply_to_page_range+0x3ec/0x650
[    C6]  apply_to_existing_page_range+0x1c/0x30
[    C6]  kasan_release_vmalloc+0xa4/0x118
[    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
[    C6]  drain_vmap_area_work+0x60/0xc0
[    C6]  process_one_work+0x4cc/0xa38
[    C6]  worker_thread+0x240/0x638
[    C6]  kthread+0x1c8/0x1e0
[    C6]  ret_from_fork+0x10/0x20

Kefeng Wang (3):
     mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
     mm: kasan: shadow: move free_page() out of page table lock
     mm: kasan: shadow: HACK add cond_resched_lock() in
       kasan_depopulate_vmalloc_pte()

The first 2 patches look ok, but yeah, the last is a hack. I also
don't have any better suggestions, only more questions.

Thanks Marco, maybe we could convert free_vmap_area_lock from spinlock to
mutex lock only if KASAN enabled?

I do not think it is a good suggestion. Could you please clarify the
reason of such conversion?

See Call Trace of softlock, when map/unmap the vmalloc buf, the kasan 
will populate and depopulate vmalloc pte, those will spend more time 
than no-kasan kernel, for unmap, and there is already a 
cond_resched_lock() in __purge_vmap_area_lazy(), but with more time 
consumed under spinlock(free_vmap_area_lock), and we couldn't add 
cond_resched_lock in kasan_depopulate_vmalloc_pte(), so if spin lock 
converted to mutex lock, we could add a cond_resched into kasan 
depopulate, this is why make such conversion if kasan enabled, but this
conversion maybe not correct, any better solution?

Does this only happen on arm64?

Our test case run on arm64 qemu(host is x86), so it run much more slower
than real board.
Do you have a minimal reproducer you can share?
Here is the code in test driver,

void *buf = vmalloc(40UL << 30);
vfree(buf);

What is a test driver? Why do you need 42G of memmory, for which purpose?

This is just to accelerate reproduction of above issue, the main reason 
of the issue is too much time spent during kasan_populate_vmalloc() and
kasan_release_vmalloc().

Thanks.

--
Uladzislau Rezki