On Thu, 20 Jun 2013, Roland Dreier wrote: > Christoph, your argument would be a lot more convincing if you stopped > repeating this nonsense. Sure, in a strict sense, it might be true Well this is regarding tracking of pages that need to stay resident and since the kernel does the pinning through the IB subsystem it is trackable right there. No nonsense and no need for a separate pinning system call. > that the IB subsystem in the kernel is the code thatactually pins > memory, but given that unprivileged userspace can tell the kernel to > pin arbitrary parts of its memory for any amount of time, is that > relevant? And in fact taking your "initiate" word choice above, I > don't even think your statement is true -- userspace initiates the > pinning by, for example, doing an IB memory registration (libibverbs > ibv_reg_mr() call), which turns into a system call, which leads to the > kernel trying to pin pages. The pages aren't unpinned until userspace > unregisters the memory (or causes a cleanup by closing the context > fd). In some sense userspace initiates everything since the kernels purpose is to run applications. So you can say that everything is user initated if you wanted. However, the user visible mechanism here is a registration of memory with the IB subsystem for RDMA. The primary intend is not to pin the pages but to make memory available for remote I/O. The pages are pinned *because* otherwise remote RDMA operations could corrupt memory due to the kernel moving/evicting memory. > Here's an argument by analogy. Would it make any sense for me to say > userspace can't mlock memory, because only the kernel can set > VM_LOCKED on a vma? Of course not. Userspace has the mlock() system > call, and although the actual work happens in the kernel, we clearly > want to be able to limit the amount of memory locked by the kernel ON > BEHALF OF USERSPACE. I would think that mlock is a memory management function and therefore the app/user directly says that the memory is not to be evicted from memory. This is different for the IB subsystem which is dealing with I/O and only indirectly with memory. Would we have a different mechanism to prevent reclaim etc the we would not need to pin the pages. Actual there is such a mechanism that could be used here. If you had a reserved memory region that is not mapped by the kernel (boot time alloc, device memory) then you can use VM_PFNMAP to refer to that region and the kernel would not be able to do reclaim on that memory. No pinning necessary if the IB subsystem would register that type of memory. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>