On Tue, Apr 26, 2022 at 4:22 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, 26 Apr 2022 14:57:15 -0600 Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > > > On Mon, Apr 11, 2022 at 8:16 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > On Wed, 6 Apr 2022 21:15:22 -0600 Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > > > > > > > Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that > > > > can be disabled include: > > > > 0x0001: the multi-gen LRU core > > > > 0x0002: walking page table, when arch_has_hw_pte_young() returns > > > > true > > > > 0x0004: clearing the accessed bit in non-leaf PMD entries, when > > > > CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y > > > > [yYnN]: apply to all the components above > > > > E.g., > > > > echo y >/sys/kernel/mm/lru_gen/enabled > > > > cat /sys/kernel/mm/lru_gen/enabled > > > > 0x0007 > > > > echo 5 >/sys/kernel/mm/lru_gen/enabled > > > > cat /sys/kernel/mm/lru_gen/enabled > > > > 0x0005 > > > > > > I'm shocked that this actually works. How does it work? Existing > > > pages & folios are drained over time or synchrnously? > > > > Basically we have a double-throw way, and once flipped, new (isolated) > > pages can only be added to the lists of the current implementation. > > Existing pages on the lists of the previous implementation are > > synchronously drained (isolated and then re-added), with > > cond_resched() of course. > > > > > Supporting > > > structures remain allocated, available for reenablement? > > > > Correct. > > > > > Why is it thought necessary to have this? Is it expected to be > > > permanent? > > > > This is almost a must for large scale deployments/experiments. > > > > For deployments, we need to keep fix rollout (high priority) and > > feature enabling (low priority) separate. Rolling out multiple > > binaries works but will make the process slower and more painful. So > > generally for each release, there is only one binary to roll out, and > > unless it's impossible, new features are disabled by default. Once a > > rollout completes, i.e., reaches enough population and remains stable, > > new features are turned on gradually. If something goes wrong with a > > new feature, we turn off that feature rather than roll back the > > kernel. > > > > Similarly, for A/B experiments, we don't want to use two binaries. > > Please let's spell out this sort of high-level thinking in the > changelogging. Will do. > From what you're saying, this is a transient thing. It sounds that > this enablement is only needed when mglru is at an early stage. Once > it has matured more then successive rollouts will have essentially the > same mglru implementation and being able to disable mglru at runtime > will no longer be required? I certainly hope so. But realistically this switch is here to stay, just like anything else added after careful planning or on a whim. > I guess the capability is reasonable simple/small and is livable with, > but does it have a long-term future? I see it as a necessary evil. > I mean, when organizations such as google start adopting the mglru > implementation which is present in Linus's tree we're, what, a year or > more into the future? Will they still need a kill switch then? There are two distinct possibilities: 1. Naturally the number of caps would grow. Old caps that have been proven remain the same values. New caps need to be flipped on/off for deployments/experiments. 2. The worst case scenario: this file becomes something like /sys/kernel/mm/transparent_hugepage/enabled. For different workloads, it's set to different values. Otherwise we'd have to build multiple kernel binaries.