Hi Peter, On Thu, Feb 09, 2023 at 10:19:20PM -0800, Peter Collingbourne wrote: > Thanks for the information. We encountered a similar issue internally > with the Android 5.15 common kernel. We tracked it down to an issue > with page migration, where the source page was a userspace page with > MTE tags, and the target page was allocated using KASAN (i.e. having > a non-zero KASAN tag). This caused tag check faults when the page was > subsequently accessed by the kernel as a result of the mismatching tags > from userspace. Given the number of different ways that page migration > target pages can be allocated, the simplest fix that we could think of > was to synchronize the KASAN tag in copy_highpage(). > > Can you try the patch below and let us know whether it fixes the issue? > > diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c > index 24913271e898c..87ed38e9747bd 100644 > --- a/arch/arm64/mm/copypage.c > +++ b/arch/arm64/mm/copypage.c > @@ -23,6 +23,8 @@ void copy_highpage(struct page *to, struct page *from) > > if (system_supports_mte() && test_bit(PG_mte_tagged, &from->flags)) { > set_bit(PG_mte_tagged, &to->flags); > + if (kasan_hw_tags_enabled()) > + page_kasan_tag_set(to, page_kasan_tag(from)); > mte_copy_page_tags(kto, kfrom); Why not just page_kasan_tag_reset(to)? If PG_mte_tagged is set on the 'from' page, the tags are random anyway and page_kasan_tag(from) should already be 0xff. It makes more sense to do the same for the 'to' page rather than copying the tag from the 'from' page. IOW, we are copying user-controlled tags into a page, the kernel should have a match-all tag in page->flags. > Catalin, please let us know what you think of the patch above. It > effectively partially undoes commit 20794545c146 ("arm64: kasan: Revert > "arm64: mte: reset the page tag in page->flags""), but this seems okay > to me because the mentioned race condition shouldn't affect "new" pages > such as those being used as migration targets. The smp_wmb() that was > there before doesn't seem necessary for the same reason. > > If the patch is okay, we should apply it to the 6.1 stable kernel. The > problem appears to be "fixed" in the mainline kernel because of > a bad merge conflict resolution on my part; when I rebased commit > e059853d14ca ("arm64: mte: Fix/clarify the PG_mte_tagged semantics") > past commit 20794545c146, it looks like I accidentally brought back the > page_kasan_tag_reset() line removed in the latter. But we should align > the mainline kernel with whatever we decide to do on 6.1. Happy accident ;). When I reverted such calls in commit 20794545c146, my assumption was that we always get a page that went through post_alloc_hook() and the tags were reset. But it seems that's not always the case (and probably wasteful anyway if we have to zero the tags and data on a page we know we are going to override via copy_highpage() anyway). The barrier doesn't help, so we shouldn't add it back. So, I'm fine with a stable fix but I wonder whether we should backport the whole "Fix/clarify the PG_mte_tagged semantics" series instead. -- Catalin