Re: [RFC v2 PATCH 2/2] mm: mmap: zap pages with read mmap_sem for large mapping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



at 4:08 PM, Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote:

> 
> 
> On 6/19/18 3:17 PM, Nadav Amit wrote:
>> at 4:34 PM, Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
>>  wrote:
>> 
>> 
>>> When running some mmap/munmap scalability tests with large memory (i.e.
>>> 
>>>> 300GB), the below hung task issue may happen occasionally.
>>>> 
>>> INFO: task ps:14018 blocked for more than 120 seconds.
>>>       Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>>> message.
>>> ps              D    0 14018      1 0x00000004
>>> 
>>> 
>> (snip)
>> 
>> 
>>> Zapping pages is the most time consuming part, according to the
>>> suggestion from Michal Hock [1], zapping pages can be done with holding
>>> read mmap_sem, like what MADV_DONTNEED does. Then re-acquire write
>>> mmap_sem to manipulate vmas.
>>> 
>> Does munmap() == MADV_DONTNEED + munmap() ?
> 
> Not exactly the same. So, I basically copied the page zapping used by munmap instead of calling MADV_DONTNEED.
> 
>> 
>> For example, what happens with userfaultfd in this case? Can you get an
>> extra #PF, which would be visible to userspace, before the munmap is
>> finished?
>> 
> 
> userfaultfd is handled by regular munmap path. So, no change to userfaultfd part.

Right. I see it now.

> 
>> 
>> In addition, would it be ok for the user to potentially get a zeroed page in
>> the time window after the MADV_DONTNEED finished removing a PTE and before
>> the munmap() is done?
>> 
> 
> This should be undefined behavior according to Michal. This has been discussed in  https://lwn.net/Articles/753269/.

Thanks for the reference.

Reading the man page I see: "All pages containing a part of the indicated
range are unmapped, and subsequent references to these pages will generate
SIGSEGV.”

To me it sounds pretty well-defined, and this implementation does not follow
this definition. I would expect the man page to be updated and indicate that
the behavior has changed.

Regards,
Nadav




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux