On 3/21/18 3:15 PM, Matthew Wilcox wrote:
On Wed, Mar 21, 2018 at 02:45:44PM -0700, Yang Shi wrote:
On 3/21/18 10:29 AM, Matthew Wilcox wrote:
On Wed, Mar 21, 2018 at 09:31:22AM -0700, Yang Shi wrote:
On 3/21/18 6:08 AM, Michal Hocko wrote:
Yes, this definitely sucks. One way to work that around is to split the
unmap to two phases. One to drop all the pages. That would only need
mmap_sem for read and then tear down the mapping with the mmap_sem for
write. This wouldn't help for parallel mmap_sem writers but those really
need a different approach (e.g. the range locking).
page fault might sneak in to map a page which has been unmapped before?
range locking should help a lot on manipulating small sections of a large
mapping in parallel or multiple small mappings. It may not achieve too much
for single large mapping.
I don't think we need range locking. What if we do munmap this way:
Take the mmap_sem for write
Find the VMA
If the VMA is large(*)
Mark the VMA as deleted
Drop the mmap_sem
zap all of the entries
Take the mmap_sem
Else
zap all of the entries
Continue finding VMAs
Drop the mmap_sem
Now we need to change everywhere which looks up a VMA to see if it needs
to care the the VMA is deleted (page faults, eg will need to SIGBUS; mmap
Marking vma as deleted sounds good. The problem for my current approach is
the concurrent page fault may succeed if it access the not yet unmapped
section. Marking deleted vma could tell page fault the vma is not valid
anymore, then return SIGSEGV.
does not care; munmap will need to wait for the existing munmap operation
Why mmap doesn't care? How about MAP_FIXED? It may fail unexpectedly, right?
Oh, I forgot about MAP_FIXED. Yes, MAP_FIXED should wait for the munmap
to finish. But a regular mmap can just pretend that it happened before
the munmap call and avoid the deleted VMAs.
But, my test shows race condition for reduced size mmap which calls
do_munmap(). It may need wait for the munmap finish too.
So, in my patches, I just make the do_munmap() called from mmap() hold
mmap_sem all the time.
Thanks,
Yang