On Wed 11-05-16 13:07:33, Peter Zijlstra wrote: > > > On 05/13/2015 04:38 PM, Michal Hocko wrote: > > From: Michal Hocko <mhocko@xxxxxxx> > > > > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since > > it has been introduced. > > mlock(2) fails if the memory range cannot get populated to guarantee > > that no future major faults will happen on the range. mmap(MAP_LOCKED) on > > the other hand silently succeeds even if the range was populated only > > partially. > > > > Fixing this subtle difference in the kernel is rather awkward because > > the memory population happens after mm locks have been dropped and so > > the cleanup before returning failure (munlock) could operate on something > > else than the originally mapped area. > > > > E.g. speculative userspace page fault handler catching SEGV and doing > > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing > > mmap and lead to lost data. Although it is not clear whether such a > > usage would be valid, mmap page doesn't explicitly describe requirements > > for threaded applications so we cannot exclude this possibility. > > > > This patch makes the semantic of MAP_LOCKED explicit and suggest using > > mmap + mlock as the only way to guarantee no later major page faults. > > > > URGH, this really blows chunks. It basically means MAP_LOCKED is pointless > cruft and we might as well remove it. Yeah, the usefulness of MAP_LOCKED is somehow reduced. Everybody who wants the full semantic really have to use mlock(2). > Why not fix it proper? I have tried but it turned out to be a problem because we are dropping mmap_sem after we initialized VMA and as Linus pointed out there are multithreaded applications which are doing opportunistic memory management[1]. So we would have to hold the mmap_sem for write during the whole VMA setup + population and that doesn't seem to be worth all the trouble when we are even not sure whether somebody relies on MAP_LOCKED to have the hard mlock semantic. --- [1] http://lkml.kernel.org/r/CA+55aFydkG-BgZzry5DrTzueVh9VvEcVJdLV8iOyUphQk=0vpw@xxxxxxxxxxxxxx -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>