> > > > +0x0002 clear the accessed bit in leaf page table entries **in large > > > > + batches**, when MMU sets it (e.g., on x86) > > > > > > Is extra markup really needed here... > > > > > > > +0x0004 clear the accessed bit in non-leaf page table entries **as > > > > + well**, when MMU sets it (e.g., on x86) > > > > > > ... and here? > > > > Will do. > > > > > As for the descriptions, what is the user-visible effect of these features? > > > How different modes of clearing the access bit are reflected in, say, GUI > > > responsiveness, database TPS, or probability of OOM? > > > > These remain to be seen :) I just added these switches in v7, per Mel's > > request from the meeting we had. These were never tested in the field. > > I see :) > > It would be nice to have a description or/and examples of user-visible > effects when there will be some insight on what these features do. How does the following sound? Clearing the accessed bit in large batches can theoretically cause lock contention (mmap_lock), and if it happens the 0x0002 switch can disable this feature. In this case the multigenerational LRU suffers a minor performance degradation. Clearing the accessed bit in non-leaf page table entries was only verified on Intel and AMD, and if it causes problems on other x86 varieties the 0x0004 switch can disable this feature. In this case the multigenerational LRU suffers a negligible performance degradation. > > > > +:Debugfs interface: ``/sys/kernel/debug/lru_gen`` has the following > > > > > > Is debugfs interface relevant only for datacenters? > > > > For the moment, yes. > > And what will happen if somebody uses these interfaces outside > datacenters? As soon as there is a sysfs intefrace, somebody will surely > play with it. > > I think the job schedulers might be the most important user of that > interface, but the documentation should not presume it is the only user. Other ideas are more like brainstorming than concrete use cases, e.g., for desktop users, these interface can in theory speed up hibernation (suspend to disk); for VM users, they can again in theory support auto ballooning. These niches are really minor and less explored compared with the data center use cases which have been dominant. I was hoping we could focus on the essential and take one step at a time. Later on, if there is additional demand and resource, then we expand to cover more use cases. > > > > + job scheduler writes to this file at a certain time interval to > > > > + create new generations, and it ranks available servers based on the > > > > + sizes of their cold memory defined by this time interval. For > > > > + proactive reclaim, a job scheduler writes to this file before it > > > > + tries to land a new job, and if it fails to materialize the cold > > > > + memory without impacting the existing jobs, it retries on the next > > > > + server according to the ranking result. > > > > > > Is this knob only relevant for a job scheduler? Or it can be used in other > > > use-cases as well? > > > > There are other concrete use cases but I'm not ready to discuss them > > yet. > > Here as well, as soon as there is an interface it's not necessarily "job > scheduler" that will "write to this file", anybody can write to that file. > Please adjust the documentation to be more neutral regarding the use-cases. Will do.