On Tue, Dec 22, 2020 at 11:42 AM Robbie Harwood <rharwood@xxxxxxxxxx> wrote: > > > I believe you are assuming the consequent when you suggest that kernel > developers should be somehow fixing this in userspace. > > To back up: the described problem is the manifestation of an interaction > between swap and the OOM condition. The OOM killer is a > popularly-understood piece of what goes on in the system during OOM, but > it's not like the rest of the kernel can be ignored. (I would argue > that part of the reason it's well understood is their insistence that it > remain simple, but that's getting off into the weeds.) No the problem happens any time a resource becomes constrained: cpu, memory, io. It's not exclusively a swap problem. When swap pressure is part of the problem, it depends on how swap is being used. Heavy IO page out is a good thing. Heavy IO page-in and page-out of the same pages is a bad thing. > > So, several control points here: > > - OOM killer behavior. I think we're in agreement that this isn't the > thing that needs changed. If you mean the kernel oomkiller, yeah probably. That's generally considered to be working correctly. It keeps the kernel functioning, with forward progress being made without any respect whatsoever to user space priorities like system responsiveness. > - Enabling swap. Swap is really slow (by virtue of not being RAM...) > and we don't seem willing to acknowledge that. If we want the system > to be snappy and responsive... we shouldn't be swapping. This is not entirely correct. The chosen workload might be excessive compared to the allocated resources. That does happen, I see it from time to time, but it's not that common because it results in a lot of pain for the user. So they stop doing it. This is an underprovisioned system. If you aren't swapping at all, that means you have allocated more resources than the workload requires. You've over provisioned. This is apparently quite common in the Kubernetes workflow, because Kubernetes doesn't work properly with swap, somehow by design. So their view is, don't create a swap device, just overprovision. Swap is for evicting anonymous pages, pages that aren't backed by any kind of file. If inactive anonymous pages can't be swapped, they have to stay in memory. And when memory is under pressure, the kernel has no choice but to resort to reclaim, i.e. evict pages that are backed by files. This will end up looking a lot like swap thrashing. Another factor is there have been recent improvements in the swap code to make dirty page eviction much better and avoid swap thrashing. You'll need a 5.10 kernel for the most recent work on this. > > - Swap aggressiveness. Suggested by above, people want swap anyway. > (Sometimes it's for hibernation (not supported, but that stops no > one), sometimes it's for... historical reasons? Underprovisioning?) > This could be tuned to the use cases we actually want. The idea of proper resource control is to use swap more effectively, to reduce the heavy swap thrashing. It's not a problem to do dirty page eviction (page out). That frees memory and makes it less likely other processes will thrash. > > - Education. Get people to a point where admins don't deploy swap on > systems that aren't going to hibernate. I'll readily admit this one > might be hardest. That is bad advice. We do need swap. https://chrisdown.name/2018/01/02/in-defence-of-swap.html There's a nice tl;dr at the top and a summary at the bottom. And quite detailed explanation in the middle. > And even possibly the (conceptually) simplest solution of all: > > - Swap usage monitoring as described for oomd... but in the kernel. > This saves you on all the overhead of running in userspace, if nothing > else. This exists in the form of PSI, as well as cgroupsv2: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html memory.swap.current memory.swap.events memory.swap.high memory.swap.max > But what really bothers me here is that, to my knowledge, no one has > tried to actually make any of these happen in the kernel. There's a > vague perception of what "the kernel devs" want, as if they're some > other, but... has anyone asked? If so, we should be able to quote what > the response was, and a good design proposal should include it as an > explanation of why that route wasn't taken. I'm not even sure what you're asking for. There is no such thing as a one size fits all set of policies for resource control. There are kernel-side components for this, as well as user space, to implement a policy. -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx