On Mon 13-12-21 05:15:21, Alexey Avramov wrote: > So, the problem described by Artem S. Tashkinov in 2019 is still easily > reproduced in 2021. The assurances of the maintainers that they consider > the thrashing and near-OOM stalls to be a serious problems are difficult to > take seriously while they ignore the obvious solution: if reclaiming file > caches leads to thrashing, then you just need to prohibit deleting the file > cache. And allow the user to control its minimum amount. These are rather strong claims. While this might sound like a very easy solution/workaround I have already tried to express my concerns [1]. Really, you should realize that such a knob would become carved into stone as soon as wee merge this and we will need to support it for ever! It is really painful (if possible at all) to deprecate any tunable knobs that cannot be supported anymore because the underlying implementation doesn't allow for that. So we would absolutely need to be sure this is the right approach to the problem. I am not convinced about that though. How does the admin know the limit should be set to a certain workload? What if the workload characteristics change and the existing setting is just to restrictive? What if the workload istrashing over something different than anon/file memory (e.g. any other cache that we have or might have in the future)? As you have pointed out there were general recommendations to use user space based oom killer solutions which can be tuned for the specific workload or used in an environment where the disruptive OOM killer action is less of a problem because workload can be restarted easily without too much harm caused by the oom killer. Please keep in mind that there are many more different workloads that have different requirements and an oom killer invocation can be really much worse than a slow progress due to ephemeral, peak or even longer term trashing or heavy refaults. The kernel OOM killer acts as the last resort solution and therefore stays really conservative. I do believe that integrating PSI metrics into that decision is the right direction. It is not a trivial one though. Why is this better approach than a simple limit? Well, for one, it is a feedback based solution. System knows it is trashing and can estimate how hard. It is not about a specific type of memory because we can detect refaults on both file and anonymous memory (it can be extended should there be a need for future types of caches or reclaimable memory). Memory reclaim can work with that information and balance differen resources dynamically based on the available feedback. MM code will not need to expose implementation details about how the reclaim works and so we do not bind ourselves into longterm specifics. See the difference? If you can live with pre-mature and over-eager OOM killer policy then all fine. Use existing userspace solutions. If you want to work on an in kernel solution please try to understand complexity and historical experience with similar solution first. It also helps to understand that there are no simple solutions on the table. MM reclaim code has evolved over many years. I am strongly suspecting we ran out of simple solutions already. We also got burnt many times. Let's not repeat some errors again. [1] http://lkml.kernel.org/r/Ya3fG2rp+860Yb+t@xxxxxxxxxxxxxx -- Michal Hocko SUSE Labs