On Tue, Oct 01, 2024 at 03:27:48PM +0100, Ryan Roberts wrote: > Hi Peter, > > On 08/08/2024 12:25, Ryan Roberts wrote: > > On 07/08/2024 19:59, Peter Xu wrote: > >> On Wed, Aug 07, 2024 at 12:18:18PM +0200, David Hildenbrand wrote: > >>> On 07.08.24 10:58, David Hildenbrand wrote: > >>>> On 06.08.24 22:29, Peter Xu wrote: > >>>>> On Tue, Aug 06, 2024 at 06:37:55PM +0200, David Hildenbrand wrote: > >>>>>> On 06.08.24 17:15, Ryan Roberts wrote: > >>>>>>> Hi Peter, David, > >>>>> > >>>>> Hi, Ryan, > >>>>> > >>>>>>> > >>>>>>> syzkaller has found an issue (at least on arm64, but I suspect it will be > >>>>>>> visible on x86_64 too) that triggers the following warning: > >>>>> > >>>>> This is true. I can easily reproduce.. > >>>>> > > [...] > > >> When I'm looking at this specific issue again, it's more than ptes that > >> should need to remove the uffd-wp bit. We have: > >> > >> - pmd/pud/hugetlb in other paths that will need similar care.. > >> > >> - move_page_tables() smartness on HAVE_MOVE_PUD.. where we may need to > >> walk the pmd page removing the bits when necessary.. > >> > >> - more importantly, mremap_userfaultfd_prep() might be too late if it's > >> after moving pgtables.. > >> > >> - [not yet started looking] the mlock issue Ryan mentioned.. > >> > >> Looks like we'll need more things to fix and test.. > >> > >> I wished if I can simply disable UFFD_WP + EVENT_REMAP, but I think even > >> with that, by default when mremap() we should still logically tear down all > >> those uffd-wp bits which is the same as !EVENT_REMAP now.. > >> > >> Let me know if anyone would like to beat me to it on fixing the whole > >> thing, I'd be more than happy.. > > > > Afraid I won't be able to sign up to doing that work. > > > > Otherwise, I'll probably need to postpone > >> the fix of this issue for 1-2 weeks but finish some other things first.. > > I'm not sure if there was any progress on this? We are still seeing the problem > on v6.12-rc1. Hi, Ryan, I haven't yet got free time to look at this, sorry. I confess I didn't prioritize this as high, as I doubt anyone would make real use of it, or hit this issue in real workloads, and it'll even slow down generic workloads even if slightly. Do you want to have a look? It'll be great if so. Or I can try to find some time this month. Thanks, -- Peter Xu