On 6/28/18 12:10 PM, Yang Shi wrote:
On 6/28/18 4:51 AM, Michal Hocko wrote:
On Wed 27-06-18 10:23:39, Yang Shi wrote:
On 6/27/18 12:24 AM, Michal Hocko wrote:
On Tue 26-06-18 18:03:34, Yang Shi wrote:
On 6/26/18 12:43 AM, Peter Zijlstra wrote:
On Mon, Jun 25, 2018 at 05:06:23PM -0700, Yang Shi wrote:
By looking this deeper, we may not be able to cover all the
unmapping range
for VM_DEAD, for example, if the start addr is in the middle of
a vma. We
can't set VM_DEAD to that vma since that would trigger SIGSEGV
for still
mapped area.
splitting can't be done with read mmap_sem held, so maybe just
set VM_DEAD
to non-overlapped vmas. Access to overlapped vmas (first and
last) will
still have undefined behavior.
Acquire mmap_sem for writing, split, mark VM_DEAD, drop mmap_sem.
Acquire
mmap_sem for reading, madv_free drop mmap_sem. Acquire mmap_sem for
writing, free everything left, drop mmap_sem.
?
Sure, you acquire the lock 3 times, but both write instances
should be
'short', and I suppose you can do a demote between 1 and 2 if you
care.
Thanks, Peter. Yes, by looking the code and trying two different
approaches,
it looks this approach is the most straight-forward one.
Yes, you just have to be careful about the max vma count limit.
Yes, we should just need copy what do_munmap does as below:
if (end < vma->vm_end && mm->map_count >= sysctl_max_map_count)
return -ENOMEM;
If the mas map count limit has been reached, it will return failure
before
zapping mappings.
Yeah, but as soon as you drop the lock and retake it, somebody might
have changed the adddress space and we might get inconsistency.
So I am wondering whether we really need upgrade_read (to promote read
to write lock) and do the
down_write
split & set up VM_DEAD
downgrade_write
unmap
upgrade_read
zap ptes
up_write
Promoting to write lock may be a trouble. There might be other users in
the critical section with read lock, we have to wait them to finish.
I'm supposed address space changing just can be done by mmap, mremap,
mprotect. If so, we may utilize the new VM_DEAD flag. If the VM_DEAD
flag is set for the vma, just return failure since it is being unmapped.
Does it sounds reasonable?
It looks we just need care about MAP_FIXED (mmap) and MREMAP_FIXED
(mremap), right?
How about letting them return -EBUSY or -EAGAIN to notify the
application? This changes the behavior a little bit, MAP_FIXED and
mremap may fail if they fail the race with munmap (if the mapping is
larger than 1GB). I'm not sure if any multi-threaded application uses
MAP_FIXED and MREMAP_FIXED very heavily which may run into the race
condition. I guess it should be rare to meet all the conditions to
trigger the race.
The programmer should be very cautious about MAP_FIXED.MREMAP_FIXED
since they may corrupt its own address space as the man page noted.
Thanks,
Yang
Thanks,
Yang
looks terrible, no question about that, but we won't drop the mmap sem
at any time.