> On Friday, February 10, 2023 at 6:54 AM Andrey Konovalov > <andreyknvl@xxxxxxxxx> > wrote: > > On Thu, Feb 9, 2023 at 11:44 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> > > wrote: > > > > > > On Thu, 9 Feb 2023 at 10:19, 袁帅(Shuai Yuan) <yuanshuai@xxxxxxxx> > > wrote: > > > > > > > > Hi Dmitry Vyukov > > > > > > > > Thanks, I see that your means. > > > > > > > > Currently, report_suppressed() seem not work in Kasan-HW mode, it > > always return false. > > > > Do you think should change the report_suppressed function? > > > > I don't know why CONFIG_KASAN_HW_TAGS was blocked separately > > before. > > > > > > That logic was added by Andrey in: > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c > > > om > > > mit/?id=c068664c97c7cf > > > > > > Andrey, can we make report_enabled() check current->kasan_depth and > > > remove report_suppressed()? > > > > I decided to not use kasan_depth for HW_TAGS, as we can always use a > > match-all tag to make "invalid" memory accesses. > > > > I think we can fix the reporting code to do exactly that so that it > > doesn't cause MTE faults. > > > > Shuai, could you clarify, at which point due kasan_report_invalid_free > > an MTE exception is raised in your tests? > > Yes, I need some time to clarify this problem with a clear log by test. > Hi Andrey and Dmitry I have got valid information to clarify the problem and solutions. I made a few changes to the code to do this. a) I was testing on a device that had hardware issues with MTE, and the memory tag sometimes changed randomly. b) I did this test on kernel version 5.15, but this problem should exist on the latest kernel version from a code perspective. c) Run the kernel with a single core by "maxcpus=1". d) Code modify, (1) Call dump_stack_lvl(KERN_ERR) when start_report() returns false, this is done based on the current patch v2. (2) Add some log in print_address_description() to show kmem_cache address and memory tag. https://elixir.bootlin.com/linux/v5.15.94/source/mm/kasan/report.c#L252 @@ -255,24 +260,25 @@ static void print_address_description(void *addr, u8 tag) dump_stack_lvl(KERN_ERR); pr_err("\n"); - +pr_err("ys:1\n"); if (page && PageSlab(page)) { struct kmem_cache *cache = page->slab_cache; -void *object = nearest_obj(cache, page,addr); +void *object = NULL; +pr_err("ys:cache start %llx, mtag:%x, page_address:%llx\n", +cache, hw_get_mem_tag(cache), page_address(page)); +object = nearest_obj(cache, page, addr); + pr_err("ys:cache end %llx, object %llx, page_address:%llx\n", + cache, object, page_address(page)); describe_object(cache, object, addr, tag); } (3) Add kasan_enable_tagging() to KUNIT_EXPECT_KASAN_FAIL in https://elixir.bootlin.com/linux/v5.15.94/source/lib/test_kasan.c#L94 This ensures that kunit is tested on this unstable device. e) With the above modification we can get the backtrace: ys:1 ys:cache start f4ffff8140005380, mtag:fe, page_address:ffffff8140328000 ys:cache change f4ffff8140005380, mtag:fe, page_address:ffffff8140328000 ys: error address:f4ffff8140005398 Pointer tag: [f4], memory tag: [fe] CPU: 0 PID: 100 Comm: kunit_try_catch Tainted: Call trace: dump_backtrace.cfi_jt+0x0/0x8 show_stack+0x28/0x38 dump_stack_lvl+0x68/0x98 __kasan_report+0x110/0x29c kasan_report+0x40/0x8c __do_kernel_fault+0xd4/0x2c4 do_bad_area+0x40/0x100 do_tag_check_fault+0x2c/0x40 do_mem_abort+0x74/0x138 el1_abort+0x40/0x64 el1h_64_sync_handler+0x60/0xa0 el1h_64_sync+0x7c/0x80 print_address_description+0x154/0x2e8 __kasan_report+0x200/0x29c kasan_report+0x40/0x8c __do_kernel_fault+0xd4/0x2c4 do_bad_area+0x40/0x100 do_tag_check_fault+0x2c/0x40 do_mem_abort+0x74/0x138 el1_abort+0x40/0x64 el1h_64_sync_handler+0x60/0xa0 el1h_64_sync+0x7c/0x80 enqueue_entity+0x23c/0x4b8 enqueue_task_fair+0x13c/0x48c enqueue_task.llvm.1684042887774774428+0xd0/0x250 __do_set_cpus_allowed+0x1ac/0x304 __set_cpus_allowed_ptr_locked+0x168/0x28c migrate_enable+0xf0/0x17c kasan_strings+0x59c/0x72c kunit_try_run_case+0x84/0x128 kunit_generic_run_threadfn_adapter+0x48/0x80 kthread+0x17c/0x1e8 ret_from_fork+0x10/0x20 ys:cache end f4ffff8140005380, object ffffff814032ca00, page_address:ffffff8140328000 f) From the above log, you can see that the system tried to call kasan_report() twice, because we visit tag address by kmem_cache and this tag have change.. Normally this doesn't happen easily. So I think we can add kasan_reset_tag() to handle the kmem_cache address. For example, the following changes are used for the latest kernel version. diff --git a/mm/kasan/report.c b/mm/kasan/report.c --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -412,7 +412,7 @@ static void complete_report_info(struct kasan_report_info *info) slab = kasan_addr_to_slab(addr); if (slab) { - info->cache = slab->slab_cache; + info->cache = kasan_reset_tag(slab->slab_cache); info->object = nearest_obj(info->cache, slab, addr); I have tested Kernel5.15 using a similar approach and it seems to work. On the other hand, I think there should be other solutions and hope to get your feedback. Thanks a lot. > > > Then we can also remove the comment in kasan_report_invalid_free(). > > > > > > It looks like kasan_disable_current() in kmemleak needs to affect > > > HW_TAGS mode as well: > > > https://elixir.bootlin.com/linux/v6.2-rc7/source/mm/kmemleak.c#L301 > > > > It uses kasan_reset_tag, so it should work properly with HW_TAGS. ZEKU 信息安全声明:本邮件包含信息归发件人所在组织ZEKU所有。 禁止任何人在未经授权的情况下以任何形式(包括但不限于全部或部分披露、复制或传播)使用包含的信息。若您错收了本邮件,请立即电话或邮件通知发件人,并删除本邮件及附件。 Information Security Notice: The information contained in this mail is solely property of the sender's organization ZEKU. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it.