Re: Warning on mremapped uffd-wp memory

Ryan Roberts <ryan.roberts@xxxxxxx> · Tue, 1 Oct 2024 16:31:31 +0100

On 01/10/2024 16:10, Peter Xu wrote:
> On Tue, Oct 01, 2024 at 03:27:48PM +0100, Ryan Roberts wrote:
>> Hi Peter,
>>
>> On 08/08/2024 12:25, Ryan Roberts wrote:
>>> On 07/08/2024 19:59, Peter Xu wrote:
>>>> On Wed, Aug 07, 2024 at 12:18:18PM +0200, David Hildenbrand wrote:
>>>>> On 07.08.24 10:58, David Hildenbrand wrote:
>>>>>> On 06.08.24 22:29, Peter Xu wrote:
>>>>>>> On Tue, Aug 06, 2024 at 06:37:55PM +0200, David Hildenbrand wrote:
>>>>>>>> On 06.08.24 17:15, Ryan Roberts wrote:
>>>>>>>>> Hi Peter, David,
>>>>>>>
>>>>>>> Hi, Ryan,
>>>>>>>
>>>>>>>>>
>>>>>>>>> syzkaller has found an issue (at least on arm64, but I suspect it will be
>>>>>>>>> visible on x86_64 too) that triggers the following warning:
>>>>>>>
>>>>>>> This is true.  I can easily reproduce..
>>>>>>>
>>
>> [...]
>>
>>>> When I'm looking at this specific issue again, it's more than ptes that
>>>> should need to remove the uffd-wp bit.  We have:
>>>>
>>>>   - pmd/pud/hugetlb in other paths that will need similar care..
>>>>
>>>>   - move_page_tables() smartness on HAVE_MOVE_PUD.. where we may need to
>>>>     walk the pmd page removing the bits when necessary..
>>>>
>>>>   - more importantly, mremap_userfaultfd_prep() might be too late if it's
>>>>     after moving pgtables..
>>>>
>>>>   - [not yet started looking] the mlock issue Ryan mentioned..
>>>>
>>>> Looks like we'll need more things to fix and test..
>>>>
>>>> I wished if I can simply disable UFFD_WP + EVENT_REMAP, but I think even
>>>> with that, by default when mremap() we should still logically tear down all
>>>> those uffd-wp bits which is the same as !EVENT_REMAP now..
>>>>
>>>> Let me know if anyone would like to beat me to it on fixing the whole
>>>> thing, I'd be more than happy..  
>>>
>>> Afraid I won't be able to sign up to doing that work.
>>>
>>> Otherwise, I'll probably need to postpone
>>>> the fix of this issue for 1-2 weeks but finish some other things first..
>>
>> I'm not sure if there was any progress on this? We are still seeing the problem
>> on v6.12-rc1.
> 
> Hi, Ryan,
> 
> I haven't yet got free time to look at this, sorry.  I confess I didn't
> prioritize this as high, as I doubt anyone would make real use of it, or
> hit this issue in real workloads, and it'll even slow down generic
> workloads even if slightly.

No problem, I'm acting as the middle man really, given -rc1 is out, Mark has
been running his usual fuzzing and noted that the issue still exists. So I
thought I'd just enquire to see if you were able to make any progress. I agree
its not high priority. Although for a panic_on_warn=1 kernel (which I understand
some use in deployment), this means that user space can panic the system, so I
guess it needs to be addressed eventually.

> 
> Do you want to have a look?  It'll be great if so.  Or I can try to find
> some time this month.

I won't personally get time to look at this, since I'm busy with some other
commitments. But I might be able to find someone to look into it. Leave it with
me for now.

Thanks,
Ryan

> 
> Thanks,
>