On Mon, May 9, 2022 at 9:09 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > On Mon, May 09 2022 at 18:50, Alexander Potapenko wrote: > > Indeed, calling kmsan_unpoison_memory() in irqentry_enter() was > > supposed to be enough, but we have code in kmsan_unpoison_memory() (as > > well as other runtime functions) that checks for kmsan_in_runtime() > > and bails out to prevent potential recursion if KMSAN code starts > > calling itself. > > > > kmsan_in_runtime() is implemented as follows: > > > > ============================================== > > static __always_inline bool kmsan_in_runtime(void) > > { > > if ((hardirq_count() >> HARDIRQ_SHIFT) > 1) > > return true; > > return kmsan_get_context()->kmsan_in_runtime; > > } > > ============================================== > > (see the code here: > > https://lore.kernel.org/lkml/20220426164315.625149-13-glider@xxxxxxxxxx/#Z31mm:kmsan:kmsan.h) > > > > If we are running in the task context (in_task()==true), > > kmsan_get_context() returns a per-task `struct *kmsan_ctx`. > > If `in_task()==false` and `hardirq_count()>>HARDIRQ_SHIFT==1`, it > > returns a per-CPU one. > > Otherwise kmsan_in_runtime() is considered true to avoid dealing with > > nested interrupts. > > > > So in the case when `hardirq_count()>>HARDIRQ_SHIFT` is greater than > > 1, kmsan_in_runtime() becomes a no-op, which leads to false positives. > > But, that'd only > 1 when there is a nested interrupt, which is not the > case. Interrupt handlers keep interrupts disabled. The last exception from > that rule was some legacy IDE driver which is gone by now. That's good to know, then we probably don't need this hardirq_count() check anymore. > So no, not a good explanation either. After looking deeper I see that unpoisoning was indeed skipped because kmsan_in_runtime() returned true, but I was wrong about the root cause. The problem was not caused by a nested hardirq, but rather by the fact that the KMSAN hook in irqentry_enter() was called with in_task()==1. Roughly said, T0 was running some code in the task context, then it started executing KMSAN instrumentation and entered the runtime by setting current->kmsan_ctx.kmsan_in_runtime. Then an IRQ kicked in and started calling asm_sysvec_apic_timer_interrupt() => sysvec_apic_timer_interrupt(regs) => irqentry_enter(regs) - but at that point in_task() was still true, therefore kmsan_unpoison_memory() became a no-op. As far as I can see, it is irq_enter_rcu() that makes in_task() return 0 by incrementing the preempt count in __irq_enter_raw(), so our unpoisoning can only work if we perform it after we enter the IRQ context. I think the best that can be done here is (as suggested above) to provide some kmsan_unpoison_pt_regs() function that will only be called from the entry points and won't be doing reentrancy checks. It should be safe, because unpoisoning boils down to calculating shadow/origin addresses and calling memset() on them, no instrumented code will be involved. We could try to figure out the places in idtentry code where normal kmsan_unpoison_memory() can be called in IRQ context, but as far as I can see it will depend on the type of the entry point. Another way to deal with the problem is to not rely on in_task(), but rather use some per-cpu counter in irqentry_enter()/irqentry_exit() to figure out whether we are in IRQ code already. However this is only possible irqentry_enter() itself guarantees that the execution cannot be rescheduled to another CPU - is that the case? > Thanks, > > tglx > > -- > You received this message because you are subscribed to the Google Groups "kasan-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@xxxxxxxxxxxxxxxx. > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/871qx2r09k.ffs%40tglx. -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.