Re: [PATCH 0/5] Transparent Page Placement for Tiered-Memory

Hasan Al Maruf <hasan3050@xxxxxxxxx> · Mon, 29 Nov 2021 19:36:34 -0500

Hi Huang,

We find the patches in the tiering series are well thought and helpful.
For our workloads, we initially started with that series and we find the
whole series is too complex and some features do not benefit as
expected. Therefore, we have come up with the current basic patches which
are essential and help achieve most of the intended behaviors while
reducing complexity as much as possible.

As we started with your tiering series (with 72 patches), there are
overlaps between our patches and the tiering series. We adopt the
functionalities from the tiering series, modify, and extend them to make
page placement mechanism simpler but workable. Here is the key points for
each of the patches in our Transparent Page Placement series.

Patch #1:
We combine all the promotion and demotion related statistics in this patch
Having statistics on both promotion, demotion, and failures help observe
the systems behavior and reason about performance behavior. Besides, anon
vs file breakdown in both promotion and demotion path help understand
application behavior on a tiered memory systems. As applications may have
different sensitivity toward the anon and file placements, this breakdown
in the migration path is often helpful to assess the effectiveness of the
page placement policy.

Patch #2:
This patch largely overlaps with your current series on NUMA Balancing.
https://lore.kernel.org/lkml/20211116013522.140575-1-ying.huang@xxxxxxxxx/
This patch is a combination of your Patch #2 and Patch #3 except the
static 10MB free space in the top-tier node to maintain a free headroom
for new allocation and promotion. Rather, we find having a user defined
demote watermark would make it more generic that we include in our patch#3

Patch #3:
This patch has the logic for having a separate demote watermark per node.
In the tiering series, that demote watermark is somewhat bound to the
cgroup and triggered on per-application basis. Besides, It only supports
cgroup-v1. However, we think, instead of cgroup based soft reclamation,
a global per-node demote watermark is more meaningful and should be the
basic one to start with. In that case, the user does not have to think
about per-application setup.

Patch #4:
This patch includes the code for kswapd based reclamation. As I mentioned
earlier, instead of cgroup-based reclamation, here we look whether a node
is balanced during each kswapd invocation. For top-tier node, we check
whether kswapd reclaimed till DEMOTE_WMARK is satisfied, for other nodes
the default mechanism continues. The differences between tiering series
and this patch is the cgroup based reclamation vs per-node reclamation.

Patch #5:
In your patches for promotion, you consider re-fault time for promotion
candidate selection. Although the hot-threshold is tunable, from our
experiments, we find this not helpful to some extent. For example, if
different subset of pages have different re-access time, time-based
promotion should not be able to distinguish between them. If you make
the time window long enough, then any infrequently accessed pages will
also become the promotion candidate, and later be a candidate for the
demotion.

In this patch, we propose LRU based promotion, which would give anon and
files different promotion paths. If pages are used sporadically at high
frequency, irregular pages would be eventually moved from the active LRU
list. We find that our LRU based approach can reduce up to 11x promotion
traffic while retaining the same application throughput for multiple
workloads.

Besides, with promotion rate limit, if files largely get promoted to
top-tier, anon promotion rate often gets hampered as files are taking the
large portion of the total rate (which often happen for applications that
generates huge caches). In our LRU-based approach, each type has their own
separate LRU to check. So for workloads with smaller anons and large file
usage, with LRU-based approach, we can see more anons are being promoted
rather than the files.

I don't mind this patchset being merged to your current patchset under
discussion or any later ones. But, I think this series contains the very
basic functionalities to have a workable page placement mechanism for
tiered-memory. This can obviously be augmented by the other features in
you future tiering series.

Best,
Hasan