On Fri, 24 May 2013, Peter Zijlstra wrote: > Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages") > broke RLIMIT_MEMLOCK. Nope the patch fixed a problem with double accounting. The problem that we seem to have is to define what mlocked and pinned mean and how this relates to RLIMIT_MEMLOCK. mlocked pages are pages that are movable (not pinned!!!) and that are marked in some way by user space actions as mlocked (POSIX semantics). They are marked with a special page flag (PG_mlocked). Pinned pages are pages that have an elevated refcount because the hardware needs to use these pages for I/O. The elevated refcount may be temporary (then we dont care about this) or for a longer time (such as the memory registration of the IB subsystem). That is when we account the memory as pinned. The elevated refcount stops page migration and other things from trying to move that memory. Pages can be both pinned and mlocked. Before my patch some pages those two issues were conflated since the same counter was used and therefore these pages were counted twice. If an RDMA application was running using mlockall() and was performing large scale I/O then the counters could show extraordinary large numbers and the VM would start to behave erratically. It is important for the VM to know which pages cannot be evicted but that involves many more pages due to dirty pages etc etc. So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages that userspace has mlocked. You want the counter to mean something different it seems. What is it? I think we need to be first clear on what we want to accomplish and what these counters actually should count before changing things. Certainly would appreciate improvements in this area but resurrecting the conflation between mlocked and pinned pages is not the way to go. > This patch proposes to properly fix the problem by introducing > VM_PINNED. This also provides the groundwork for a possible mpin() > syscall or MADV_PIN -- although these are not included. Maybe add a new PIN page flag? Pages are not pinned per vma as the patch seems to assume. > It recognises that pinned page semantics are a strict super-set of > locked page semantics -- a pinned page will not generate major faults > (and thus satisfies mlock() requirements). Not exactly true. Pinned pages may not have the mlocked flag set and they are not managed on the unevictable LRU lists of the MM. > If people find this approach unworkable, I request we revert the above > mentioned patch to at least restore RLIMIT_MEMLOCK to a usable state > again. Cannot do that. This will cause the breakage that the patch was fixing to resurface. -- To unsubscribe from this list: send the line "unsubscribe trinity" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html