Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping

Michal Hocko <mhocko@xxxxxxxxxx> · Tue, 3 Jul 2018 08:09:21 +0200

On Mon 02-07-18 13:48:45, Andrew Morton wrote:
> On Mon, 2 Jul 2018 16:05:02 +0200 Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> 
> > On Fri 29-06-18 20:15:47, Andrew Morton wrote:
> > [...]
> > > Would one of your earlier designs have addressed all usecases?  I
> > > expect the dumb unmap-a-little-bit-at-a-time approach would have?
> > 
> > It has been already pointed out that this will not work.
> 
> I said "one of".  There were others.

Well, I was aware only about two potential solutions. Either do the
heavy lifting under the shared lock and do the rest with the exlusive
one and this, drop the lock per parts. Maybe I have missed others?

> > You simply
> > cannot drop the mmap_sem during unmap because another thread could
> > change the address space under your feet. So you need some form of
> > VM_DEAD and handle concurrent and conflicting address space operations.
> 
> Unclear that this is a problem.  If a thread does an unmap of a range
> of virtual address space, there's no guarantee that upon return some
> other thread has not already mapped new stuff into that address range. 
> So what's changed?

Well, consider the following scenario:
Thread A = calling mmap(NULL, sizeA)
Thread B = calling munmap(addr, sizeB)

They do not use any external synchronization and rely on the atomic
munmap. Thread B only munmaps range that it knows belongs to it (e.g.
called mmap in the past). It should be clear that ThreadA should not
get an address from the addr, sizeB range, right? In the most simple case
it will not happen. But let's say that the addr, sizeB range has
unmapped holes for what ever reasons. Now anytime munmap drops the
exclusive lock after handling one VMA, Thread A might find its sizeA
range and use it. ThreadB then might remove this new range as soon as it
gets its exclusive lock again.

Is such a code safe? No it is not and I would call it fragile at best
but people tend to do weird things and atomic munmap behavior is
something they can easily depend on.

Another example would be an atomic address range probing by
MAP_FIXED_NOREPLACE. It would simply break for similar reasons.

I remember my attempt to make MAP_LOCKED consistent with mlock (if the
population fails then return -ENOMEM) and that required to drop the
shared mmap_sem and take it in exclusive mode (because we do not
have upgrade_read) and Linus was strongly against [1][2] for very
similar reasons. If you drop the lock you simply do not know what
happened under your feet.

[1] http://lkml.kernel.org/r/CA+55aFydkG-BgZzry5DrTzueVh9VvEcVJdLV8iOyUphQk=0vpw@xxxxxxxxxxxxxx
[2] http://lkml.kernel.org/r/CA+55aFyajquhGhw59qNWKGK4dBV0TPmDD7-1XqPo7DZWvO_hPg@xxxxxxxxxxxxxx
-- 
Michal Hocko
SUSE Labs