Hi Catalin, On Fri, Feb 10, 2023 at 10:28 AM Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > Hi Peter, > > On Thu, Feb 09, 2023 at 10:19:20PM -0800, Peter Collingbourne wrote: > > Thanks for the information. We encountered a similar issue internally > > with the Android 5.15 common kernel. We tracked it down to an issue > > with page migration, where the source page was a userspace page with > > MTE tags, and the target page was allocated using KASAN (i.e. having > > a non-zero KASAN tag). This caused tag check faults when the page was > > subsequently accessed by the kernel as a result of the mismatching tags > > from userspace. Given the number of different ways that page migration > > target pages can be allocated, the simplest fix that we could think of > > was to synchronize the KASAN tag in copy_highpage(). > > > > Can you try the patch below and let us know whether it fixes the issue? > > > > diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c > > index 24913271e898c..87ed38e9747bd 100644 > > --- a/arch/arm64/mm/copypage.c > > +++ b/arch/arm64/mm/copypage.c > > @@ -23,6 +23,8 @@ void copy_highpage(struct page *to, struct page *from) > > > > if (system_supports_mte() && test_bit(PG_mte_tagged, &from->flags)) { > > set_bit(PG_mte_tagged, &to->flags); > > + if (kasan_hw_tags_enabled()) > > + page_kasan_tag_set(to, page_kasan_tag(from)); > > mte_copy_page_tags(kto, kfrom); > > Why not just page_kasan_tag_reset(to)? If PG_mte_tagged is set on the > 'from' page, the tags are random anyway and page_kasan_tag(from) should > already be 0xff. It makes more sense to do the same for the 'to' page > rather than copying the tag from the 'from' page. IOW, we are copying > user-controlled tags into a page, the kernel should have a match-all tag > in page->flags. That would also work, but I was thinking that if copy_highpage() were being used to copy a KASAN page we should keep the original tag in order to maintain tag checks for page accesses. > > Catalin, please let us know what you think of the patch above. It > > effectively partially undoes commit 20794545c146 ("arm64: kasan: Revert > > "arm64: mte: reset the page tag in page->flags""), but this seems okay > > to me because the mentioned race condition shouldn't affect "new" pages > > such as those being used as migration targets. The smp_wmb() that was > > there before doesn't seem necessary for the same reason. > > > > If the patch is okay, we should apply it to the 6.1 stable kernel. The > > problem appears to be "fixed" in the mainline kernel because of > > a bad merge conflict resolution on my part; when I rebased commit > > e059853d14ca ("arm64: mte: Fix/clarify the PG_mte_tagged semantics") > > past commit 20794545c146, it looks like I accidentally brought back the > > page_kasan_tag_reset() line removed in the latter. But we should align > > the mainline kernel with whatever we decide to do on 6.1. > > Happy accident ;). When I reverted such calls in commit 20794545c146, my > assumption was that we always get a page that went through > post_alloc_hook() and the tags were reset. But it seems that's not > always the case (and probably wasteful anyway if we have to zero the > tags and data on a page we know we are going to override via > copy_highpage() anyway). The barrier doesn't help, so we shouldn't add > it back. > > So, I'm fine with a stable fix but I wonder whether we should backport > the whole "Fix/clarify the PG_mte_tagged semantics" series instead. That seems fine to me (or as well as the above patch if we decide to copy the tag). Peter