On Mon 16-10-17 12:56:43, Cristopher Lameter wrote: > On Mon, 16 Oct 2017, Michal Hocko wrote: > > > > We already have that issue and have ways to control that by tracking > > > pinned and mlocked pages as well as limits on their allocations. > > > > Ohh, it is very different because mlock limit is really small (64kB) > > which is not even close to what this is supposed to be about. Moreover > > mlock doesn't prevent from migration and so it doesn't prevent > > compaction to form higher order allocations. > > The mlock limit is configurable. There is a tracking of pinned pages as > well. I am not aware of any such generic tracking API. The attempt by Peter has never been merged. So what we have right now is just an adhoc tracking... > > Really, this is just too dangerous without a deep consideration of all > > the potential consequences. The more I am thinking about this the more I > > am convinced that this all should be driver specific mmap based thing. > > If it turns out to be too restrictive over time and there are more > > experiences about the usage we can consider thinking about a more > > generic API. But starting from the generic MAP_ flag is just asking for > > problems. > > This issue is already present with the pinning of lots of memory via the > RDMA API when in use for large gigabyte ranges. ... like in those > There is nothing new aside > from memory being contiguous with this approach. which makes a hell of a difference. Once you allow to pin larger blocks of memory you make the whole compaction hopelessly ineffective. > > > There is not much new here in terms of problems. The hardware that > > > needs this seems to become more and more plentiful. That is why we need a > > > generic implementation. > > > > It would really help to name that HW and other potential usecases > > independent on the HW because I am rather skeptical about the > > _plentiful_ part. And so I really do not see any foundation to claim > > the generic part. Because, fundamentally, it is the HW which requires > > the specific memory placement/physically contiguous range etc. So the > > generic implementation doesn't really make sense in such a context. > > RDMA hardware? Storage interfaces? Look at what the RDMA subsystem > and storage (NVME?) support. > > This is not a hardware specific thing but a reflection of the general > limitations of the exiting 4k page struct scheme that limits performance > and causes severe pressure on I/O devices. This is something more for storage people to comment. I expect (NVME) storage to use DAX and it support for large and direct access. Nothing really prevents RDMA HW to provide mmap implementation to use contiguous pages, we already provide an API to allocate large memory. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html