On Tue, Oct 01, 2024 at 04:31:31PM +0100, Ryan Roberts wrote: > On 01/10/2024 16:10, Peter Xu wrote: > > On Tue, Oct 01, 2024 at 03:27:48PM +0100, Ryan Roberts wrote: > >> Hi Peter, > >> > >> On 08/08/2024 12:25, Ryan Roberts wrote: > >>> On 07/08/2024 19:59, Peter Xu wrote: > >>>> On Wed, Aug 07, 2024 at 12:18:18PM +0200, David Hildenbrand wrote: > >>>>> On 07.08.24 10:58, David Hildenbrand wrote: > >>>>>> On 06.08.24 22:29, Peter Xu wrote: > >>>>>>> On Tue, Aug 06, 2024 at 06:37:55PM +0200, David Hildenbrand wrote: > >>>>>>>> On 06.08.24 17:15, Ryan Roberts wrote: > >>>>>>>>> Hi Peter, David, > >>>>>>> > >>>>>>> Hi, Ryan, > >>>>>>> > >>>>>>>>> > >>>>>>>>> syzkaller has found an issue (at least on arm64, but I suspect it will be > >>>>>>>>> visible on x86_64 too) that triggers the following warning: > >>>>>>> > >>>>>>> This is true. I can easily reproduce.. > >>>>>>> > >> > >> [...] > >> > >>>> When I'm looking at this specific issue again, it's more than ptes that > >>>> should need to remove the uffd-wp bit. We have: > >>>> > >>>> - pmd/pud/hugetlb in other paths that will need similar care.. > >>>> > >>>> - move_page_tables() smartness on HAVE_MOVE_PUD.. where we may need to > >>>> walk the pmd page removing the bits when necessary.. > >>>> > >>>> - more importantly, mremap_userfaultfd_prep() might be too late if it's > >>>> after moving pgtables.. > >>>> > >>>> - [not yet started looking] the mlock issue Ryan mentioned.. > >>>> > >>>> Looks like we'll need more things to fix and test.. > >>>> > >>>> I wished if I can simply disable UFFD_WP + EVENT_REMAP, but I think even > >>>> with that, by default when mremap() we should still logically tear down all > >>>> those uffd-wp bits which is the same as !EVENT_REMAP now.. > >>>> > >>>> Let me know if anyone would like to beat me to it on fixing the whole > >>>> thing, I'd be more than happy.. > >>> > >>> Afraid I won't be able to sign up to doing that work. > >>> > >>> Otherwise, I'll probably need to postpone > >>>> the fix of this issue for 1-2 weeks but finish some other things first.. > >> > >> I'm not sure if there was any progress on this? We are still seeing the problem > >> on v6.12-rc1. > > > > Hi, Ryan, > > > > I haven't yet got free time to look at this, sorry. I confess I didn't > > prioritize this as high, as I doubt anyone would make real use of it, or > > hit this issue in real workloads, and it'll even slow down generic > > workloads even if slightly. > > No problem, I'm acting as the middle man really, given -rc1 is out, Mark has > been running his usual fuzzing and noted that the issue still exists. So I > thought I'd just enquire to see if you were able to make any progress. I agree > its not high priority. Although for a panic_on_warn=1 kernel (which I understand > some use in deployment), this means that user space can panic the system, so I > guess it needs to be addressed eventually. > > > > > Do you want to have a look? It'll be great if so. Or I can try to find > > some time this month. > > I won't personally get time to look at this, since I'm busy with some other > commitments. But I might be able to find someone to look into it. Leave it with > me for now. Thank you! If there's patches I can definitely try to review them. Or if this won't get addressed before someone else pokes again, I'll do it. Thanks, -- Peter Xu