On Tue 28-07-15 09:49:42, Eric B Munson wrote: > On Tue, 28 Jul 2015, Michal Hocko wrote: > > > [I am sorry but I didn't get to this sooner.] > > > > On Mon 27-07-15 10:54:09, Eric B Munson wrote: > > > Now that VM_LOCKONFAULT is a modifier to VM_LOCKED and > > > cannot be specified independentally, it might make more sense to mirror > > > that relationship to userspace. Which would lead to soemthing like the > > > following: > > > > A modifier makes more sense. > > > > > To lock and populate a region: > > > mlock2(start, len, 0); > > > > > > To lock on fault a region: > > > mlock2(start, len, MLOCK_ONFAULT); > > > > > > If LOCKONFAULT is seen as a modifier to mlock, then having the flags > > > argument as 0 mean do mlock classic makes more sense to me. > > > > > > To mlock current on fault only: > > > mlockall(MCL_CURRENT | MCL_ONFAULT); > > > > > > To mlock future on fault only: > > > mlockall(MCL_FUTURE | MCL_ONFAULT); > > > > > > To lock everything on fault: > > > mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT); > > > > Makes sense to me. The only remaining and still tricky part would be > > the munlock{all}(flags) behavior. What should munlock(MLOCK_ONFAULT) > > do? Keep locked and poppulate the range or simply ignore the flag an > > just unlock? > > > > I can see some sense to allow munlockall(MCL_FUTURE[|MLOCK_ONFAULT]), > > munlockall(MCL_CURRENT) resp. munlockall(MCL_CURRENT|MCL_FUTURE) but > > other combinations sound weird to me. > > > > Anyway munlock with flags opens new doors of trickiness. > > In the current revision there are no new munlock[all] system calls > introduced. munlockall() unconditionally cleared both MCL_CURRENT and > MCL_FUTURE before the set and now unconditionally clears all three. > munlock() does the same for VM_LOCK and VM_LOCKONFAULT. OK if new munlock{all}(flags) is not introduced then this is much saner IMO. > If the user > wants to adjust mlockall flags today, they need to call mlockall a > second time with the new flags, this remains true for mlockall after > this set and the same behavior is mirrored in mlock2. OK, this makes sense to me. > The only > remaining question I have is should we have 2 new mlockall flags so that > the caller can explicitly set VM_LOCKONFAULT in the mm->def_flags vs > locking all current VMAs on fault. I ask because if the user wants to > lock all current VMAs the old way, but all future VMAs on fault they > have to call mlockall() twice: > > mlockall(MCL_CURRENT); > mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT); > > This has the side effect of converting all the current VMAs to > VM_LOCKONFAULT, but because they were all made present and locked in the > first call, this should not matter in most cases. I think this is OK (worth documenting though) considering that ONFAULT is just modifier for the current mlock* operation. The memory is locked the same way for both - aka once the memory is present you do not know whether it was done during mlock call or later during the fault. > The catch is that, > like mmap(MAP_LOCKED), mlockall() does not communicate if mm_populate() > fails. This has been true of mlockall() from the beginning so I don't > know if it needs more than an entry in the man page to clarify (which I > will add when I add documentation for MCL_ONFAULT). Yes this is true but unlike mmap it seems fixable I guess. We do not have to unmap and we can downgrade mmap_sem to read and the fault so nobody can race with a concurent mlock. > In a much less > likely corner case, it is not possible in the current setup to request > all current VMAs be VM_LOCKONFAULT and all future be VM_LOCKED. Vlastimil has already pointed that out. MCL_FUTURE doesn't clear MCL_CURRENT. I was quite surprised in the beginning but it makes a perfect sense. mlockall call shouldn't lead into munlocking, that would be just weird. Clearing MCL_FUTURE on MCL_CURRENT makes sense on the other hand because the request is explicit about _current_ memory and it doesn't lead to any munlocking. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html