On Wed, Dec 8, 2010 at 3:27 PM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: >> Currently mlock() holds mmap_sem in exclusive mode while the pages get >> faulted in. In the case of a large mlock, this can potentially take a >> very long time, during which various commands such as 'ps auxw' will >> block. This makes sysadmins unhappy: >> >> real 14m36.232s >> user 0m0.003s >> sys 0m0.015s >>(output from 'time ps auxw' while a 20GB file was being mlocked without >> being previously preloaded into page cache) > > The kernel holds down_write(mmap_sem) for 14m36s? Yes... [... patch snipped off ...] > Am I correct in believing that we'll still hold down_read(mmap_sem) for > a quarter hour? Yes, patch 1/6 changes the long hold time to be in read mode instead of write mode, which is only a band-aid. But, this prepares for patch 5/6, which releases mmap_sem whenever there is contention on it or when blocking on disk reads. > We don't need to hold mmap_sem at all while faulting in those pages, > do we? We could just do > > for (addr = start, addr < end; addr += PAGE_SIZE) > get_user(x, addr); > > and voila. If the pages are in cache and the ptes are set up then that > will be *vastly* faster than the proposed code. If the get_user() > takes a minor fault then it'll be slower. If it's a major fault then > the difference probably doesn't matter much. get_user wouldn't suffice if the page is already mapped in, as we need to mark it as PageMlocked. Also, we need to skip IO and PFNMAP regions. I don't think you can make things much simpler than what I ended up with. > But whatever. Is this patchset a half-fix, and should we rather be > looking for a full-fix? I think the series fully fixes the mlock() and mlockall() cases, which has been the more pressing use case for us. Even then, there are still cases where we could still observe long mmap_sem hold times - fundamentally, every place that calls get_user_pages (or do_mmap, in the mlockall MCL_FUTURE case) with a large page range may create such problems. From the looks of it, most of these places wouldn't actually care if the mmap_sem got dropped in the middle of the operation, but a general fix will have to involve looking at all the call sites to be sure. -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href