On Mon, 2023-12-11 at 12:50 +0100, Alexander Potapenko wrote: > On Tue, Nov 21, 2023 at 11:06 PM Ilya Leoshkevich <iii@xxxxxxxxxxxxx> > wrote: > > > > Like for KASAN, it's useful to temporarily disable KMSAN checks > > around, > > e.g., redzone accesses. Introduce kmsan_disable_current() and > > kmsan_enable_current(), which are similar to their KASAN > > counterparts. > > Initially we used to have this disablement counter in KMSAN, but > adding it uncontrollably can result in KMSAN not functioning > properly. > E.g. forgetting to call kmsan_disable_current() or underflowing the > counter will break reporting. > We'd better put this API in include/linux/kmsan.h to indicate it > should be discouraged. > > > Even though it's not strictly necessary, make them reentrant, in > > order > > to match the KASAN behavior. > > Until this becomes strictly necessary, I think we'd better > KMSAN_WARN_ON if the counter is re-entered. I encountered a case when we are freeing memory from an interrupt handler: [ 149.840553] ------------[ cut here ]------------ [ 149.840649] WARNING: CPU: 1 PID: 181 at mm/kmsan/hooks.c:447 kmsan_disable_current+0x2e/0x40 [ 149.840790] Modules linked in: [ 149.840894] CPU: 1 PID: 181 Comm: (direxec) Tainted: G B W N 6.7.0-rc5-gd34a4b46f382 #13 [ 149.841003] Hardware name: IBM 3931 A01 704 (KVM/Linux) [ 149.841094] Krnl PSW : 0404c00180000000 000000000197dbc2 (kmsan_disable_current+0x32/0x40) [ 149.841276] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 149.841420] Krnl GPRS: 0000000000000040 0000000096914100 0000000000001000 0000000000000001 [ 149.841518] 0000036d827daee0 0000000007c97008 0000000080096500 0000000092f4f000 [ 149.841617] 0000036d00000000 0000000000000000 0000000000000040 0000000000000000 [ 149.841712] 0000000092f4efc0 00000001ff710f60 000000000193acba 0000037f0008f710 [ 149.841893] Krnl Code: 000000000197dbb6: eb0018640352 mviy 14436(%r1),0 [ 149.841893] 000000000197dbbc: 07fe bcr 15,%r14 [ 149.841893] #000000000197dbbe: af000000 mc 0,0 [ 149.841893] >000000000197dbc2: a7f4fffa brc 15,000000000197dbb6 [ 149.841893] 000000000197dbc6: 0700 bcr 0,%r0 [ 149.841893] 000000000197dbc8: 0700 bcr 0,%r0 [ 149.841893] 000000000197dbca: 0700 bcr 0,%r0 [ 149.841893] 000000000197dbcc: 0700 bcr 0,%r0 [ 149.842438] Call Trace: 15:37:25 [90/1838] [ 149.842510] [<000000000197dbc2>] kmsan_disable_current+0x32/0x40 [ 149.842631] ([<000000000193ac14>] slab_pad_check+0x1d4/0xac0) [ 149.842738] [<0000000001949222>] free_to_partial_list+0x1d72/0x3b80 [ 149.842850] [<0000000001947066>] __slab_free+0xd86/0x11d0 [ 149.842956] [<00000000019111e8>] kmem_cache_free+0x15d8/0x25d0 [ 149.843062] [<0000000000229e3a>] __tlb_remove_table+0x20a/0xa50 [ 149.843174] [<00000000016c7f98>] tlb_remove_table_rcu+0x98/0x120 [ 149.843291] [<000000000083e1c6>] rcu_core+0x15b6/0x54b0 [ 149.843406] [<00000000069c3c0e>] __do_softirq+0xa1e/0x2178 [ 149.843514] [<00000000003467b4>] irq_exit_rcu+0x2c4/0x630 [ 149.843623] [<0000000006949f6e>] do_ext_irq+0x9e/0x120 [ 149.843736] [<00000000069c18d4>] ext_int_handler+0xc4/0xf0 [ 149.843841] [<000000000197e428>] kmsan_get_metadata+0x68/0x280 [ 149.843950] [<000000000197e344>] kmsan_get_shadow_origin_ptr+0x74/0xf0 [ 149.844071] [<000000000197ba3a>] __msan_metadata_ptr_for_load_8+0x2a/0x40 [ 149.844192] [<0000000000184e4a>] unwind_get_return_address+0xda/0x150 [ 149.844313] [<000000000018fd12>] arch_stack_walk+0x172/0x2f0 [ 149.844417] [<00000000008f1af0>] stack_trace_save+0x100/0x160 [ 149.844529] [<000000000197af22>] kmsan_internal_chain_origin+0x62/0xe0 [ 149.844647] [<000000000197c1f0>] __msan_chain_origin+0xd0/0x160 [ 149.844763] [<00000000068b3ba4>] memchr_inv+0x5b4/0xb20 [ 149.844877] [<000000000193e730>] check_bytes_and_report+0xa0/0xd30 [ 149.844986] [<000000000193b920>] check_object+0x420/0x17d0 [ 149.845092] [<000000000194aa8a>] free_to_partial_list+0x35da/0x3b80 [ 149.845202] [<0000000001947066>] __slab_free+0xd86/0x11d0 [ 149.845308] [<00000000019111e8>] kmem_cache_free+0x15d8/0x25d0 [ 149.845414] [<00000000016bc2fe>] exit_mmap+0x87e/0x1200 [ 149.845524] [<00000000002f315c>] mmput+0x13c/0x5b0 [ 149.845632] [<0000000001b9d634>] exec_mmap+0xc34/0x1230 [ 149.845744] [<0000000001b996c2>] begin_new_exec+0xcf2/0x2520 [ 149.845857] [<0000000001f6a084>] load_elf_binary+0x2364/0x67d0 [ 149.845971] [<0000000001ba5ba4>] bprm_execve+0x25b4/0x4010 [ 149.846083] [<0000000001baa7e6>] do_execveat_common+0x2436/0x2600 [ 149.846200] [<0000000001ba78f8>] __s390x_sys_execve+0x108/0x140 [ 149.846314] [<000000000011b192>] do_syscall+0x4c2/0x690 [ 149.846424] [<0000000006949d78>] __do_syscall+0x98/0xe0 [ 149.846536] [<00000000069c1640>] system_call+0x70/0xa0 [ 149.846638] INFO: lockdep is turned off. [ 149.846846] Last Breaking-Event-Address: [ 149.846916] [<000000000197dbb2>] kmsan_disable_current+0x22/0x40 [ 149.847057] irq event stamp: 0 [ 149.847128] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 149.847227] hardirqs last disabled at (0): [<00000000002f8f46>] copy_process+0x21f6/0x8b20 [ 149.847344] softirqs last enabled at (0): [<00000000002f8f80>] copy_process+0x2230/0x8b20 [ 149.847461] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 149.847559] ---[ end trace 0000000000000000 ]--- [ 149.865485] ===================================================== Using a counter resolves this issue, but, of course, at the expense of not reporting valid issues in the interrupt handler. Unfortunately I don't see another easy way to solve this problem. The possibilities that come to mind are providing uninstrumented memchr_inv() or disablement flags for each context, but I'm not sure if we want to go there, especially since KASAN already has this limitation.