Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Thu, 12 May 2022 18:17:11 +0200

On Thu, May 12 2022 at 14:24, Alexander Potapenko wrote:
> On Mon, May 9, 2022 at 9:09 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> > So in the case when `hardirq_count()>>HARDIRQ_SHIFT` is greater than
>> > 1, kmsan_in_runtime() becomes a no-op, which leads to false positives.
>>
>> But, that'd only > 1 when there is a nested interrupt, which is not the
>> case. Interrupt handlers keep interrupts disabled. The last exception from
>> that rule was some legacy IDE driver which is gone by now.
>
> That's good to know, then we probably don't need this hardirq_count()
> check anymore.
>
>> So no, not a good explanation either.
>
> After looking deeper I see that unpoisoning was indeed skipped because
> kmsan_in_runtime() returned true, but I was wrong about the root
> cause.
> The problem was not caused by a nested hardirq, but rather by the fact
> that the KMSAN hook in irqentry_enter() was called with in_task()==1.

Argh, the preempt counter increment happens _after_ irqentry_enter().

> I think the best that can be done here is (as suggested above) to
> provide some kmsan_unpoison_pt_regs() function that will only be
> called from the entry points and won't be doing reentrancy checks.
> It should be safe, because unpoisoning boils down to calculating
> shadow/origin addresses and calling memset() on them, no instrumented
> code will be involved.

If you keep them where I placed them, then there is no need for a
noinstr function. It's already instrumentable.

> We could try to figure out the places in idtentry code where normal
> kmsan_unpoison_memory() can be called in IRQ context, but as far as I
> can see it will depend on the type of the entry point.

NMI is covered as it increments before it invokes the unpoison().

Let me figure out why we increment the preempt count late for
interrupts. IIRC it's for symmetry reasons related to softirq processing
on return, but let me double check.

> Another way to deal with the problem is to not rely on in_task(), but
> rather use some per-cpu counter in irqentry_enter()/irqentry_exit() to
> figure out whether we are in IRQ code already.

Well, if you have a irqentry() specific unpoison, then you know the
context, right?

> However this is only possible irqentry_enter() itself guarantees that
> the execution cannot be rescheduled to another CPU - is that the case?

Obviously. It runs with interrupts disabled and eventually on a
separate interrupt stack.

Thanks,

        tglx