On Thu, 14 Feb 2019, Jason Gunthorpe wrote: > On Thu, Feb 14, 2019 at 01:46:51PM -0800, Ira Weiny wrote: > > > > > > Really unclear how to fix this. The pinned/locked split with two > > > > > buckets may be the right way. > > > > > > > > Are you suggesting that we have 2 user limits? > > > > > > This is what RDMA has done since CL's patch. > > > > I don't understand? What is the other _user_ limit (other than > > RLIMIT_MEMLOCK)? > > With todays implementation RLIMIT_MEMLOCK covers two user limits, > total number of pinned pages and total number of mlocked pages. The > two are different buckets and not summed. Applications were failing at some point because they were effectively summed up. If you mlocked/pinned a dataset of more than half the memory of a system then things would get really weird. Also there is the possibility of even more duplication because pages can be pinned by multiple kernel subsystems. So you could get more than doubling of the number. The sane thing was to account them separately so that mlocking and pinning worked without apps failing and then wait for another genius to find out how to improve the situation by getting the pinned page mess under control. It is not even advisable to check pinned pages against any limit because pages can be pinned by multiple subsystems. The main problem here is that we only have a refcount to indicate pinning and no way to clearly distinguish long term from short pins. In order to really fix this issue we would need to have a list of subsystems that have taken long term pins on a page. But doing so would waste a lot of memory and cause a significant performance regression. And the discussions here seem to be meandering around these issues. Nothing really that convinces me that we have a clean solution at hand.