Re: [PATCH v2] kasan: fix deadlock in start_report()

Andrey Konovalov <andreyknvl@xxxxxxxxx> · Mon, 27 Feb 2023 03:13:45 +0100

On Wed, Feb 15, 2023 at 2:22 PM 袁帅(Shuai Yuan) <yuanshuai@xxxxxxxx> wrote:
>
> I have got valid information to clarify the problem and solutions. I made
> a few changes to the code to do this.
>
> a) I was testing on a device that had hardware issues with MTE,
>     and the memory tag sometimes changed randomly.

Ah, I see. Faulty hardware explains the problem. Thank you!

> f) From the above log, you can see that the system tried to call kasan_report() twice,
>    because we visit tag address by kmem_cache and this tag have change..
>    Normally this doesn't happen easily. So I think we can add kasan_reset_tag() to handle
>    the kmem_cache address.
>
>    For example, the following changes are used for the latest kernel version.
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -412,7 +412,7 @@ static void complete_report_info(struct kasan_report_info *info)
>         slab = kasan_addr_to_slab(addr);
>         if (slab) {
> -               info->cache = slab->slab_cache;
> +               info->cache = kasan_reset_tag(slab->slab_cache);

This fixes the problem for accesses to slab_cache, but KASAN reporting
code also accesses stack depot memory and calls other routines that
might access (faulty) tagged memory. And the accessed addresses aren't
exposed to KASAN code, so we can't use kasan_reset_tag for those.

I wonder what would be a good solution here. I really don't want to
use kasan_depth or some other global/per-cpu flag here, as it would be
too good of a target for attackers wishing to bypass MTE. Perhaps,
disabling MTE once reporting started would be a better option: calling
the disabling routine would arguably be a harder task for an attacker
than overwriting a flag.

+Catalin, would it be acceptable to implement a routine that disables
in-kernel MTE tag checking (until the next
mte_enable_kernel_sync/async/asymm call)? In a similar way an MTE
fault does this, but without the fault itself. I.e., expose the part
of do_tag_recovery functionality without report_tag_fault?

TL;DR on the problem: Besides relying on CPU tag checks, KASAN also
does explicit tag checks to detect double-frees and similar problems,
see the calls to kasan_report_invalid_free. Thus, when e.g. a
double-free report is printed, MTE checking is still on. This results
in a deadlock in case invalid memory is accessed during KASAN
reporting.

Thanks!