On Mon, Jan 10, 2022 at 12:27:19PM +0200, Mike Rapoport wrote: > Hi, > > On Tue, Jan 04, 2022 at 01:22:27PM -0700, Yu Zhao wrote: > > Add /sys/kernel/mm/lru_gen/enabled as a runtime kill switch. > > > > Add /sys/kernel/mm/lru_gen/min_ttl_ms for thrashing prevention. > > Compared with the size-based approach, e.g., [1], this time-based > > approach has the following advantages: > > 1) It's easier to configure because it's agnostic to applications and > > memory sizes. > > 2) It's more reliable because it's directly wired to the OOM killer. > > > > Add /sys/kernel/debug/lru_gen for working set estimation and proactive > > reclaim. Compared with the page table-based approach and the PFN-based > > approach, e.g., mm/damon/[vp]addr.c, this lruvec-based approach has > > the following advantages: > > 1) It offers better choices because it's aware of memcgs, NUMA nodes, > > shared mappings and unmapped page cache. > > 2) It's more scalable because it's O(nr_hot_evictable_pages), whereas > > the PFN-based approach is O(nr_total_pages). > > > > Add /sys/kernel/debug/lru_gen_full for debugging. > > > > [1] https://lore.kernel.org/lkml/20211130201652.2218636d@xxxxxxxxxxxxx/ > > > > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> > > Tested-by: Konstantin Kharlamov <Hi-Angel@xxxxxxxxx> > > --- > > Documentation/vm/index.rst | 1 + > > Documentation/vm/multigen_lru.rst | 62 +++++ > > The description of user visible interfaces should go to > Documentation/admin-guide/mm > > Documentation/vm/multigen_lru.rst should have contained design description > and the implementation details and it would be great to actually have such > document. Will do, thanks. > > include/linux/nodemask.h | 1 + > > mm/vmscan.c | 415 ++++++++++++++++++++++++++++++ > > 4 files changed, 479 insertions(+) > > create mode 100644 Documentation/vm/multigen_lru.rst > > > > diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst > > index 6f5ffef4b716..f25e755b4ff4 100644 > > --- a/Documentation/vm/index.rst > > +++ b/Documentation/vm/index.rst > > @@ -38,3 +38,4 @@ algorithms. If you are looking for advice on simply allocating memory, see the > > unevictable-lru > > z3fold > > zsmalloc > > + multigen_lru > > diff --git a/Documentation/vm/multigen_lru.rst b/Documentation/vm/multigen_lru.rst > > new file mode 100644 > > index 000000000000..6f9e0181348b > > --- /dev/null > > +++ b/Documentation/vm/multigen_lru.rst > > @@ -0,0 +1,62 @@ > > +.. SPDX-License-Identifier: GPL-2.0 > > + > > +===================== > > +Multigenerational LRU > > +===================== > > + > > +Quick start > > +=========== > > +Runtime configurations > > +---------------------- > > +:Required: Write ``1`` to ``/sys/kernel/mm/lru_gen/enable`` if the > > + feature wasn't enabled by default. > > Required for what? This sentence seem to lack context. Maybe add an > overview what is Multigenerational LRU so that users will have an idea what > these knobs control. Apparently I left an important part of this quick start in the next patch, where Kconfig options are added. I'm wonder whether I should squash the next patch into this one. I always separate Kconfig changes and leave them in the last patch because it gives me peace of mind knowing it'll never give any auto bisectors a hard time. But I saw people not following this practice, and I'm also tempted to do so. Can anybody remind me whether it's considered a bad practice to have code changes and Kconfig changes in the same patch? > > + > > +Recipes > > +======= > > Some more context here will be also helpful. Will do. > > +Personal computers > > +------------------ > > +:Thrashing prevention: Write ``N`` to > > + ``/sys/kernel/mm/lru_gen/min_ttl_ms`` to prevent the working set of > > + ``N`` milliseconds from getting evicted. The OOM killer is invoked if > > + this working set can't be kept in memory. Based on the average human > > + detectable lag (~100ms), ``N=1000`` usually eliminates intolerable > > + lags due to thrashing. Larger values like ``N=3000`` make lags less > > + noticeable at the cost of more OOM kills. > > + > > +Data centers > > +------------ > > +:Debugfs interface: ``/sys/kernel/debug/lru_gen`` has the following > > + format: > > + :: > > + > > + memcg memcg_id memcg_path > > + node node_id > > + min_gen birth_time anon_size file_size > > + ... > > + max_gen birth_time anon_size file_size > > + > > + ``min_gen`` is the oldest generation number and ``max_gen`` is the > > + youngest generation number. ``birth_time`` is in milliseconds. > > + ``anon_size`` and ``file_size`` are in pages. > > And what does oldest and youngest generations mean from the user > perspective? Good question. Will add more details in the next spin.