On Mon 02-07-18 13:48:45, Andrew Morton wrote: > On Mon, 2 Jul 2018 16:05:02 +0200 Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > On Fri 29-06-18 20:15:47, Andrew Morton wrote: > > [...] > > > Would one of your earlier designs have addressed all usecases? I > > > expect the dumb unmap-a-little-bit-at-a-time approach would have? > > > > It has been already pointed out that this will not work. > > I said "one of". There were others. Well, I was aware only about two potential solutions. Either do the heavy lifting under the shared lock and do the rest with the exlusive one and this, drop the lock per parts. Maybe I have missed others? > > You simply > > cannot drop the mmap_sem during unmap because another thread could > > change the address space under your feet. So you need some form of > > VM_DEAD and handle concurrent and conflicting address space operations. > > Unclear that this is a problem. If a thread does an unmap of a range > of virtual address space, there's no guarantee that upon return some > other thread has not already mapped new stuff into that address range. > So what's changed? Well, consider the following scenario: Thread A = calling mmap(NULL, sizeA) Thread B = calling munmap(addr, sizeB) They do not use any external synchronization and rely on the atomic munmap. Thread B only munmaps range that it knows belongs to it (e.g. called mmap in the past). It should be clear that ThreadA should not get an address from the addr, sizeB range, right? In the most simple case it will not happen. But let's say that the addr, sizeB range has unmapped holes for what ever reasons. Now anytime munmap drops the exclusive lock after handling one VMA, Thread A might find its sizeA range and use it. ThreadB then might remove this new range as soon as it gets its exclusive lock again. Is such a code safe? No it is not and I would call it fragile at best but people tend to do weird things and atomic munmap behavior is something they can easily depend on. Another example would be an atomic address range probing by MAP_FIXED_NOREPLACE. It would simply break for similar reasons. I remember my attempt to make MAP_LOCKED consistent with mlock (if the population fails then return -ENOMEM) and that required to drop the shared mmap_sem and take it in exclusive mode (because we do not have upgrade_read) and Linus was strongly against [1][2] for very similar reasons. If you drop the lock you simply do not know what happened under your feet. [1] http://lkml.kernel.org/r/CA+55aFydkG-BgZzry5DrTzueVh9VvEcVJdLV8iOyUphQk=0vpw@xxxxxxxxxxxxxx [2] http://lkml.kernel.org/r/CA+55aFyajquhGhw59qNWKGK4dBV0TPmDD7-1XqPo7DZWvO_hPg@xxxxxxxxxxxxxx -- Michal Hocko SUSE Labs