On Wed, Feb 15, 2023 at 2:22 PM 袁帅(Shuai Yuan) <yuanshuai@xxxxxxxx> wrote: > > I have got valid information to clarify the problem and solutions. I made > a few changes to the code to do this. > > a) I was testing on a device that had hardware issues with MTE, > and the memory tag sometimes changed randomly. Ah, I see. Faulty hardware explains the problem. Thank you! > f) From the above log, you can see that the system tried to call kasan_report() twice, > because we visit tag address by kmem_cache and this tag have change.. > Normally this doesn't happen easily. So I think we can add kasan_reset_tag() to handle > the kmem_cache address. > > For example, the following changes are used for the latest kernel version. > diff --git a/mm/kasan/report.c b/mm/kasan/report.c > --- a/mm/kasan/report.c > +++ b/mm/kasan/report.c > @@ -412,7 +412,7 @@ static void complete_report_info(struct kasan_report_info *info) > slab = kasan_addr_to_slab(addr); > if (slab) { > - info->cache = slab->slab_cache; > + info->cache = kasan_reset_tag(slab->slab_cache); This fixes the problem for accesses to slab_cache, but KASAN reporting code also accesses stack depot memory and calls other routines that might access (faulty) tagged memory. And the accessed addresses aren't exposed to KASAN code, so we can't use kasan_reset_tag for those. I wonder what would be a good solution here. I really don't want to use kasan_depth or some other global/per-cpu flag here, as it would be too good of a target for attackers wishing to bypass MTE. Perhaps, disabling MTE once reporting started would be a better option: calling the disabling routine would arguably be a harder task for an attacker than overwriting a flag. +Catalin, would it be acceptable to implement a routine that disables in-kernel MTE tag checking (until the next mte_enable_kernel_sync/async/asymm call)? In a similar way an MTE fault does this, but without the fault itself. I.e., expose the part of do_tag_recovery functionality without report_tag_fault? TL;DR on the problem: Besides relying on CPU tag checks, KASAN also does explicit tag checks to detect double-frees and similar problems, see the calls to kasan_report_invalid_free. Thus, when e.g. a double-free report is printed, MTE checking is still on. This results in a deadlock in case invalid memory is accessed during KASAN reporting. Thanks!