On Wed, Oct 9, 2013 at 12:28 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > The workload that I got the report from was a virus scanner, it would > spawn nr_cpus threads and {mmap file, scan content, munmap} through your > filesystem. So I suspect we could make the mmap_sem write area *much* smaller for the normal cases. Look at do_mmap_pgoff(), for example: it is run entirely under mmap_sem, but 99% of what it does doesn't actually need the lock. The part that really needs the lock is addr = get_unmapped_area(file, addr, len, pgoff, flags); addr = mmap_region(file, addr, len, vm_flags, pgoff); but we hold it over all the other stuff too. In fact, even if we moved the mmap_sem down into do_mmap(), and moved code around a bit to only hold it over those functions, it would still cover unnecessarily much. For example, while merging is common, not merging is pretty common too, and we do that vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); allocation under the lock. We could easily do things like preallocate it outside the lock. Right now mmap_sem covers pretty much the whole system call (we do do some security checks outside of it). I think the main issue is that nobody has ever cared deeply enough to see how far this could be pushed. I suspect there is some low-hanging fruit for anybody who is willing to handle the pain.. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>