On Wed, Feb 08, 2023 at 05:41:45AM +0000, Qun-wei Lin (林群崴) wrote: > On Fri, 2023-02-03 at 18:51 +0100, Andrey Konovalov wrote: > > On Fri, Feb 3, 2023 at 4:41 AM Kuan-Ying Lee (李冠穎) > > <Kuan-Ying.Lee@xxxxxxxxxxxx> wrote: > > > > > > > Hi Kuan-Ying, > > > > > > > > There recently was a similar crash due to incorrectly implemented > > > > sampling. > > > > > > > > Do you have the following patch in your tree? > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://android.googlesource.com/kernel/common/*/9f7f5a25f335e6e1484695da9180281a728db7e2__;Kw!!CTRNKA9wMg0ARbw!hUjRlXirPMSusdIWe0RIPt0PNqIHYDCJyd7GSd4o-TgLMP0CKRUkjElH-jcvtaz42-sgE2U58964rCCbuNTJE5Jx$ > > > > > > > > > > > > If not, please sync your 6.1 tree with the Android common kernel. > > > > Hopefully this will fix the issue. > > > > > > > > Thanks! > > > > > > Hi Andrey, > > > > > > Thanks for your advice. > > > > > > I saw this patch is to fix ("kasan: allow sampling page_alloc > > > allocations for HW_TAGS"). > > > > > > But our 6.1 tree doesn't have following two commits now. > > > ("FROMGIT: kasan: allow sampling page_alloc allocations for > > > HW_TAGS") > > > (FROMLIST: kasan: reset page tags properly with sampling) > > > > Hi Kuan-Ying, > > > > Hi Andrey, > I'll stand in for Kuan-Ying as he's out of office. > Thanks for your help! > > > Just to clarify: these two patches were applied twice: once here on > > Jan 13: > > > > > https://urldefense.com/v3/__https://android.googlesource.com/kernel/common/*/a2a9e34d164e90fc08d35fd097a164b9101d72ef__;Kw!!CTRNKA9wMg0ARbw!kE1XiSmunRcQb9rTpKGkFc1EFJA57qr1cj7v9EZAjUBzXcSzMl-ofCI2mdtEQsxn3J4n7Lkgxb0_G745_3oO-3k$ ; > > > > > https://urldefense.com/v3/__https://android.googlesource.com/kernel/common/*/435e2a6a6c8ba8d0eb55f9aaade53e7a3957322b__;Kw!!CTRNKA9wMg0ARbw!kE1XiSmunRcQb9rTpKGkFc1EFJA57qr1cj7v9EZAjUBzXcSzMl-ofCI2mdtEQsxn3J4n7Lkgxb0_G745sDEOYWY$ ; > > > > > > Our codebase does not contain these two patches. > > > but then reverted here on Jan 20: > > > > > https://urldefense.com/v3/__https://android.googlesource.com/kernel/common/*/5503dbe454478fe54b9cac3fc52d4477f52efdc9__;Kw!!CTRNKA9wMg0ARbw!kE1XiSmunRcQb9rTpKGkFc1EFJA57qr1cj7v9EZAjUBzXcSzMl-ofCI2mdtEQsxn3J4n7Lkgxb0_G745Bl77dFY$ ; > > > > > https://urldefense.com/v3/__https://android.googlesource.com/kernel/common/*/4573a3cf7e18735a477845426238d46d96426bb6__;Kw!!CTRNKA9wMg0ARbw!kE1XiSmunRcQb9rTpKGkFc1EFJA57qr1cj7v9EZAjUBzXcSzMl-ofCI2mdtEQsxn3J4n7Lkgxb0_G745K-J8O-w$ ; > > > > > > And then once again via the link I sent before together with a fix on > > Jan 25. > > > > It might be that you still have to former two patches in your tree if > > you synced it before the revert. > > > > However, if this is not the case: > > > > Which 6.1 commit is your tree based on? > > > https://android.googlesource.com/kernel/common/+/53b3a7721b7aec74d8fa2ee55c2480044cc7c1b8 > (53b3a77 Merge 6.1.1 into android14-6.1) is the latest commit in our > tree. > > > Do you have any private MTE-related changes in the kernel? > > No, all the MTE-related code is the same as Android Common Kernel. > > > Do you have userspace MTE enabled? > > Yes, we have enabled MTE for both EL1 and EL0. Hi Qun-wei, Thanks for the information. We encountered a similar issue internally with the Android 5.15 common kernel. We tracked it down to an issue with page migration, where the source page was a userspace page with MTE tags, and the target page was allocated using KASAN (i.e. having a non-zero KASAN tag). This caused tag check faults when the page was subsequently accessed by the kernel as a result of the mismatching tags from userspace. Given the number of different ways that page migration target pages can be allocated, the simplest fix that we could think of was to synchronize the KASAN tag in copy_highpage(). Can you try the patch below and let us know whether it fixes the issue? diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c index 24913271e898c..87ed38e9747bd 100644 --- a/arch/arm64/mm/copypage.c +++ b/arch/arm64/mm/copypage.c @@ -23,6 +23,8 @@ void copy_highpage(struct page *to, struct page *from) if (system_supports_mte() && test_bit(PG_mte_tagged, &from->flags)) { set_bit(PG_mte_tagged, &to->flags); + if (kasan_hw_tags_enabled()) + page_kasan_tag_set(to, page_kasan_tag(from)); mte_copy_page_tags(kto, kfrom); } } Catalin, please let us know what you think of the patch above. It effectively partially undoes commit 20794545c146 ("arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags""), but this seems okay to me because the mentioned race condition shouldn't affect "new" pages such as those being used as migration targets. The smp_wmb() that was there before doesn't seem necessary for the same reason. If the patch is okay, we should apply it to the 6.1 stable kernel. The problem appears to be "fixed" in the mainline kernel because of a bad merge conflict resolution on my part; when I rebased commit e059853d14ca ("arm64: mte: Fix/clarify the PG_mte_tagged semantics") past commit 20794545c146, it looks like I accidentally brought back the page_kasan_tag_reset() line removed in the latter. But we should align the mainline kernel with whatever we decide to do on 6.1. Peter