Re: [LSF/MM TOPIC] Proactive Memory Reclaim

Shakeel Butt <shakeelb@xxxxxxxxxx> · Tue, 23 Apr 2019 09:33:44 -0700

On Tue, Apr 23, 2019 at 8:58 AM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, Apr 23, 2019 at 08:30:46AM -0700, Shakeel Butt wrote:
> > Though this is quite late, I still want to propose a topic for
> > discussion during LSFMM'19 which I think will be beneficial for Linux
> > users in general but particularly the data center users running a
> > range of different workloads and want to reduce the memory cost.
> >
> > Topic: Proactive Memory Reclaim
> >
> > Motivation/Problem: Memory overcommit is most commonly used technique
> > to reduce the cost of memory by large infrastructure owners. However
> > memory overcommit can adversely impact the performance of latency
> > sensitive applications by triggering direct memory reclaim. Direct
> > reclaim is unpredictable and disastrous for latency sensitive
> > applications.
> >
> > Solution: Proactively reclaim memory from the system to drastically
> > reduce the occurrences of direct reclaim. Target cold memory to keep
> > the refault rate of the applications acceptable (i.e. no impact on the
> > performance).
> >
> > Challenges:
> > 1. Tracking cold memory efficiently.
> > 2. Lack of infrastructure to reclaim specific memory.
> >
> > Details: Existing "Idle Page Tracking" allows tracking cold memory on
> > a system but it becomes prohibitively expensive as the machine size
> > grows. Also there is no way from the user space to reclaim a specific
> > 'cold' page. I want to present our implementation of cold memory
> > tracking and reclaim. The aim is to make it more generally beneficial
> > to lot more users and upstream it.
> >
>
> Why is this not partially addressed by tuning vm.watermark_scale_factor?

We want to have more control on exactly which memory pages to reclaim.
The definition of cold memory can be very job specific. With kswapd,
that is not possible.

> As for a specific cold page, why not mmap the page in question,
> msync(MS_SYNC) and call madvise(MADV_DONTNEED)? It may not be perfect in
> all cases admittedly.
>

Wouldn't this throw away the anon memory? We want to swapout that. In
our production we actually only target swapbacked memory due to very
low page fault cost from zswap.

Shakeel