at 4:34 PM, Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote: > When running some mmap/munmap scalability tests with large memory (i.e. >> 300GB), the below hung task issue may happen occasionally. > > INFO: task ps:14018 blocked for more than 120 seconds. > Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > ps D 0 14018 1 0x00000004 > (snip) > > Zapping pages is the most time consuming part, according to the > suggestion from Michal Hock [1], zapping pages can be done with holding > read mmap_sem, like what MADV_DONTNEED does. Then re-acquire write > mmap_sem to manipulate vmas. Does munmap() == MADV_DONTNEED + munmap() ? For example, what happens with userfaultfd in this case? Can you get an extra #PF, which would be visible to userspace, before the munmap is finished? In addition, would it be ok for the user to potentially get a zeroed page in the time window after the MADV_DONTNEED finished removing a PTE and before the munmap() is done? Regards, Nadav