Hi > My proposal would be as follows: > > sys_mlock > down_write(mmap_sem) > do_mlock() > for-each-vma > turn on VM_LOCKED and merge/split vma > up_write(mmap_sem) > for (addr = start of mlock range; addr < end of mlock range; > addr = next_addr) > down_read(mmap_sem) > find vma for addr > next_addr = end of the vma > if vma still has VM_LOCKED flag: > next_addr = min(next_addr, addr + few pages) > mlock a small batch of pages from that vma > (from addr to next_addr) > up_read(mmap_sem) > > Since a large mlock() can take a long time and we don't want to hold > mmap_sem for that long, we have to allow other threads to grab > mmap_sem and deal with the concurrency issues. Sound good. Can you please consider to post actual patch? > The races aren't actually too bad: > > * If some other thread creates new VM_LOCKED vmas within the mlock > range while sys_mlock() is working: both threads will be trying to > mlock_fixup the same page range at once. This is no big deal as > __mlock_vma_pages_range already only needs mmap_sem held for read: the > get_user_pages() part can safely proceed in parallel and the > mlock_vma_page() part is protected by the page lock and won't do > anything if the PageMlocked flag is already set. > > * If some other thread creates new non-VM_LOCKED vmas, or munlocks the > same address ranges that mlock() is currently working on: the mlock() > code needs to be careful here to not mlock the pages when the vmas > don't have the VM_LOCKED flag anymore. From the user process point of > view, things will look like if the mlock had completed first, followed > by the munlock. Yes, here is really key point. If user can't notice the race, it doesn't exist practically. > The other mlock related issue I have is that it marks pages as dirty > (if they are in a writable VMA), and causes writeback to work on them, > even though the pages have not actually been modified. This looks like > it would be solvable with a new get_user_pages flag for mlock use > (breaking cow etc, but not writing to the pages just yet). To be honest, I haven't understand why current code does so. I dislike it too. but I'm not sure such change is safe or not. I hope another developer comment you ;-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href