On Tue, Oct 22, 2013 at 9:20 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Tue, Oct 22, 2013 at 4:48 PM, <walken@xxxxxxxxxx> wrote: >> Generally the problems I see with mmap_sem are related to long latency >> operations. Specifically, the mmap_sem write side is currently held >> during the entire munmap operation, which iterates over user pages to >> free them, and can take hundreds of milliseconds for large VMAs. > > So this would be the *perfect* place to just downgrade the semaphore > from a write to a read. It's not as simple as that, because we currently rely on mmap_sem write side being held during page table teardown in order to exclude things like follow_page() which may otherwise access page tables while we are potentially freeing them. I do think it's solvable, but it gets complicated fast. Hugh & I have been talking about it; the approach I'm looking at would involve unwiring the page tables first (under protection of the mmap_sem write lock) and then iterating on the unwired page tables to free the data pages, issue TLB shootdowns and free the actual page tables (we probably don't need even the mmap_sem read side at that point). But, that's nowhere like a 10 line change anymore at that point... -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>