On Tue, Apr 23, 2019 at 8:58 AM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, Apr 23, 2019 at 08:30:46AM -0700, Shakeel Butt wrote: > > Though this is quite late, I still want to propose a topic for > > discussion during LSFMM'19 which I think will be beneficial for Linux > > users in general but particularly the data center users running a > > range of different workloads and want to reduce the memory cost. > > > > Topic: Proactive Memory Reclaim > > > > Motivation/Problem: Memory overcommit is most commonly used technique > > to reduce the cost of memory by large infrastructure owners. However > > memory overcommit can adversely impact the performance of latency > > sensitive applications by triggering direct memory reclaim. Direct > > reclaim is unpredictable and disastrous for latency sensitive > > applications. > > > > Solution: Proactively reclaim memory from the system to drastically > > reduce the occurrences of direct reclaim. Target cold memory to keep > > the refault rate of the applications acceptable (i.e. no impact on the > > performance). > > > > Challenges: > > 1. Tracking cold memory efficiently. > > 2. Lack of infrastructure to reclaim specific memory. > > > > Details: Existing "Idle Page Tracking" allows tracking cold memory on > > a system but it becomes prohibitively expensive as the machine size > > grows. Also there is no way from the user space to reclaim a specific > > 'cold' page. I want to present our implementation of cold memory > > tracking and reclaim. The aim is to make it more generally beneficial > > to lot more users and upstream it. > > > > Why is this not partially addressed by tuning vm.watermark_scale_factor? We want to have more control on exactly which memory pages to reclaim. The definition of cold memory can be very job specific. With kswapd, that is not possible. > As for a specific cold page, why not mmap the page in question, > msync(MS_SYNC) and call madvise(MADV_DONTNEED)? It may not be perfect in > all cases admittedly. > Wouldn't this throw away the anon memory? We want to swapout that. In our production we actually only target swapbacked memory due to very low page fault cost from zswap. Shakeel