On Thu, 2023-02-02 at 13:59 +0100, Andrey Konovalov wrote: > On Thu, Feb 2, 2023 at 6:25 AM Kuan-Ying Lee (李冠穎) > <Kuan-Ying.Lee@xxxxxxxxxxxx> wrote: > > > > On Fri, 2022-06-10 at 16:21 +0100, Catalin Marinas wrote: > > > Hi, > > > > > > That's a second attempt on fixing the race race between setting > > > the > > > allocation (in-memory) tags in a page and the corresponding > > > logical > > > tag > > > in page->flags. Initial version here: > > > > > > > > > > https://lore.kernel.org/r/20220517180945.756303-1-catalin.marinas@xxxxxxx > > > > > > This new series does not introduce any new GFP flags but instead > > > always > > > skips unpoisoning of the user pages (we already skip the > > > poisoning on > > > free). Any unpoisoned page will have the page->flags tag reset. > > > > > > For the background: > > > > > > On a system with MTE and KASAN_HW_TAGS enabled, when a page is > > > allocated > > > kasan_unpoison_pages() sets a random tag and saves it in page- > > > >flags > > > so > > > that page_to_virt() re-creates the correct tagged pointer. We > > > need to > > > ensure that the in-memory tags are visible before setting the > > > page->flags: > > > > > > P0 (__kasan_unpoison_range): P1 (access via virt_to_page): > > > Wtags=x Rflags=x > > > | | > > > | DMB | address dependency > > > V V > > > Wflags=x Rtags=x > > > > > > The first patch changes the order of page unpoisoning with the > > > tag > > > storing in page->flags. page_kasan_tag_set() has the right > > > barriers > > > through try_cmpxchg(). > > > > > > If a page is mapped in user-space with PROT_MTE, the architecture > > > code > > > will set the allocation tag to 0 and a subsequent page_to_virt() > > > dereference will fault. We currently try to fix this by resetting > > > the > > > tag in page->flags so that it is 0xff (match-all, not faulting). > > > However, setting the tags and flags can race with another CPU > > > reading > > > the flags (page_to_virt()) and barriers can't help, e.g.: > > > > > > P0 (mte_sync_page_tags): P1 (memcpy from virt_to_page): > > > Rflags!=0xff > > > Wflags=0xff > > > DMB (doesn't help) > > > Wtags=0 > > > Rtags=0 // fault > > > > > > Since clearing the flags in the arch code doesn't work, to do > > > this at > > > page allocation time when __GFP_SKIP_KASAN_UNPOISON is passed. > > > > > > Thanks. > > > > > > Catalin Marinas (4): > > > mm: kasan: Ensure the tags are visible before the tag in page- > > > > flags > > > > > > mm: kasan: Skip unpoisoning of user pages > > > mm: kasan: Skip page unpoisoning only if > > > __GFP_SKIP_KASAN_UNPOISON > > > arm64: kasan: Revert "arm64: mte: reset the page tag in page- > > > > flags" > > > > > > arch/arm64/kernel/hibernate.c | 5 ----- > > > arch/arm64/kernel/mte.c | 9 --------- > > > arch/arm64/mm/copypage.c | 9 --------- > > > arch/arm64/mm/fault.c | 1 - > > > arch/arm64/mm/mteswap.c | 9 --------- > > > include/linux/gfp.h | 2 +- > > > mm/kasan/common.c | 3 ++- > > > mm/page_alloc.c | 19 ++++++++++--------- > > > 8 files changed, 13 insertions(+), 44 deletions(-) > > > > > > > Hi kasan maintainers, > > > > We hit the following issue on the android-6.1 devices with MTE and > > HW > > tag kasan enabled. > > > > I observe that the anon flag doesn't have skip_kasan_poison and > > skip_kasan_unpoison flag and kasantag is weird. > > > > AFAIK, kasantag of anon flag needs to be 0x0. > > > > [ 71.953938] [T1403598] FramePolicy: > > [name:report&]===================================================== > > ==== > > ========= > > [ 71.955305] [T1403598] FramePolicy: [name:report&]BUG: KASAN: > > invalid-access in copy_page+0x10/0xd0 > > [ 71.956476] [T1403598] FramePolicy: [name:report&]Read at addr > > f0ffff81332a8000 by task FramePolicy/3598 > > [ 71.957673] [T1403598] FramePolicy: > > [name:report_hw_tags&]Pointer > > tag: [f0], memory tag: [ff] > > [ 71.958746] [T1403598] FramePolicy: [name:report&] > > [ 71.959354] [T1403598] FramePolicy: CPU: 4 PID: 3598 Comm: > > FramePolicy Tainted: G S W OE 6.1.0-mainline-android14- > > 0- > > ga8a53f83b9e4 #1 > > [ 71.960978] [T1403598] FramePolicy: Hardware name: MT6985(ENG) > > (DT) > > [ 71.961767] [T1403598] FramePolicy: Call trace: > > [ 71.962338] [T1403598] FramePolicy: dump_backtrace+0x108/0x158 > > [ 71.963097] [T1403598] FramePolicy: show_stack+0x20/0x48 > > [ 71.963782] [T1403598] FramePolicy: dump_stack_lvl+0x6c/0x88 > > [ 71.964512] [T1403598] FramePolicy: print_report+0x2cc/0xa64 > > [ 71.965263] [T1403598] FramePolicy: kasan_report+0xb8/0x138 > > [ 71.965986] [T1403598] > > FramePolicy: __do_kernel_fault+0xd4/0x248 > > [ 71.966782] [T1403598] FramePolicy: do_bad_area+0x38/0xe8 > > [ 71.967484] [T1403598] > > FramePolicy: do_tag_check_fault+0x24/0x38 > > [ 71.968261] [T1403598] FramePolicy: do_mem_abort+0x48/0xb0 > > [ 71.968973] [T1403598] FramePolicy: el1_abort+0x44/0x68 > > [ 71.969646] [T1403598] > > FramePolicy: el1h_64_sync_handler+0x68/0xb8 > > [ 71.970440] [T1403598] FramePolicy: el1h_64_sync+0x68/0x6c > > [ 71.971146] [T1403598] FramePolicy: copy_page+0x10/0xd0 > > [ 71.971824] [T1403598] > > FramePolicy: copy_user_highpage+0x20/0x40 > > [ 71.972603] [T1403598] FramePolicy: wp_page_copy+0xd0/0x9f8 > > [ 71.973344] [T1403598] FramePolicy: do_wp_page+0x374/0x3b0 > > [ 71.974056] [T1403598] > > FramePolicy: handle_mm_fault+0x3ec/0x119c > > [ 71.974833] [T1403598] FramePolicy: do_page_fault+0x344/0x4ac > > [ 71.975583] [T1403598] FramePolicy: do_mem_abort+0x48/0xb0 > > [ 71.976294] [T1403598] FramePolicy: el0_da+0x4c/0xe0 > > [ 71.976934] [T1403598] > > FramePolicy: el0t_64_sync_handler+0xd4/0xfc > > [ 71.977725] [T1403598] FramePolicy: el0t_64_sync+0x1a0/0x1a4 > > [ 71.978451] [T1403598] FramePolicy: [name:report&] > > [ 71.979057] [T1403598] FramePolicy: [name:report&]The buggy > > address > > belongs to the physical page: > > [ 71.980173] [T1403598] FramePolicy: > > [name:debug&]page:fffffffe04ccaa00 refcount:14 mapcount:13 > > mapping:0000000000000000 index:0x7884c74 pfn:0x1732a8 > > [ 71.981849] [T1403598] FramePolicy: > > [name:debug&]memcg:faffff80c0241000 > > [ 71.982680] [T1403598] FramePolicy: [name:debug&]anon flags: > > 0x43c000000048003e(referenced|uptodate|dirty|lru|active|swapbacked| > > arch > > _2|zone=1|kasantag=0xf) > > [ 71.984446] [T1403598] FramePolicy: raw: 43c000000048003e > > fffffffe04b99648 fffffffe04cca308 f2ffff8103390831 > > [ 71.985684] [T1403598] FramePolicy: raw: 0000000007884c74 > > 0000000000000000 0000000e0000000c faffff80c0241000 > > [ 71.986919] [T1403598] FramePolicy: [name:debug&]page dumped > > because: kasan: bad access detected > > [ 71.988022] [T1403598] FramePolicy: [name:report&] > > [ 71.988624] [T1403598] FramePolicy: [name:report&]Memory state > > around the buggy address: > > [ 71.989641] [T1403598] FramePolicy: ffffff81332a7e00: fe fe fe > > fe > > fe fe fe fe fe fe fe fe fe fe fe fe > > [ 71.990811] [T1403598] FramePolicy: ffffff81332a7f00: fe fe fe > > fe > > fe fe fe fe fe fe fe fe fe fe fe fe > > [ 71.991982] [T1403598] FramePolicy: >ffffff81332a8000: ff ff ff > > ff > > f0 f0 fc fc fc fc fc fc fc f0 f0 f3 > > [ 71.993149] [T1403598] FramePolicy: > > [name:report&] ^ > > [ 71.993972] [T1403598] FramePolicy: ffffff81332a8100: f3 f3 f3 > > f3 > > f3 f3 f0 f0 f8 f8 f8 f8 f8 f8 f8 f0 > > [ 71.995141] [T1403598] FramePolicy: ffffff81332a8200: f0 fb fb > > fb > > fb fb fb fb f0 f0 fe fe fe fe fe fe > > [ 71.996332] [T1403598] FramePolicy: > > [name:report&]===================================================== > > ==== > > ========= > > > > Originally, I suspect that some userspace pages have been migrated > > so > > the page->flags will be lost and page->flags is re-generated by > > alloc_pages(). > > Hi Kuan-Ying, > > There recently was a similar crash due to incorrectly implemented > sampling. > > Do you have the following patch in your tree? > > https://urldefense.com/v3/__https://android.googlesource.com/kernel/common/*/9f7f5a25f335e6e1484695da9180281a728db7e2__;Kw!!CTRNKA9wMg0ARbw!hUjRlXirPMSusdIWe0RIPt0PNqIHYDCJyd7GSd4o-TgLMP0CKRUkjElH-jcvtaz42-sgE2U58964rCCbuNTJE5Jx$ ; > > > If not, please sync your 6.1 tree with the Android common kernel. > Hopefully this will fix the issue. > > Thanks! Hi Andrey, Thanks for your advice. I saw this patch is to fix ("kasan: allow sampling page_alloc allocations for HW_TAGS"). But our 6.1 tree doesn't have following two commits now. ("FROMGIT: kasan: allow sampling page_alloc allocations for HW_TAGS") (FROMLIST: kasan: reset page tags properly with sampling)