On Mon, Jan 07, 2019 at 03:21:10PM -0800, Dan Williams wrote: > Randomization of the page allocator improves the average utilization of > a direct-mapped memory-side-cache. Memory side caching is a platform > capability that Linux has been previously exposed to in HPC > (high-performance computing) environments on specialty platforms. In > that instance it was a smaller pool of high-bandwidth-memory relative to > higher-capacity / lower-bandwidth DRAM. Now, this capability is going to > be found on general purpose server platforms where DRAM is a cache in > front of higher latency persistent memory [1]. > So I glanced through the series and while I won't nak it, I'm not a major fan either so I won't ack it either. While there are merits to randomisation in terms of cache coloring, it may not be robust. IIRC, the main strength of randomisation vs being smart was "it's simple and usually doesn't fall apart completely". In particular I'd worry that compaction will undo all the randomisation work by moving related pages into the same direct-mapped lines. Furthermore, the runtime list management of "randomly place and head or tail of list" will have variable and non-deterministic outcomes and may also be undone by either high-order merging or compaction. As bad as it is, an ideal world would have a proper cache-coloring allocation algorithm but they previously failed as the runtime overhead exceeded the actual benefit, particularly as fully associative caches became more popular and there was no universal "one solution fits all". One hatchet job around it may be to have per-task free-lists that put free pages into buckets with the obvious caveat that those lists would need draining and secondary locking. A caveat of that is that there may need to be arch and/or driver hooks to detect how the colors are managed which could also turn into a mess. The big plus of the series is that it's relatively simple and appears to be isolated enough that it only has an impact when the necessary hardware in place. It will deal with some cases but I'm not sure it'll survive long-term, particularly if HPC continues to report in the field that reboots are necessary to reshufffle the lists (taken from your linked documents). That workaround of running STREAM before a job starts and rebooting the machine if the performance SLAs are not met is horrid. -- Mel Gorman SUSE Labs