On Wed, Apr 21, 2021 at 11:46 AM <Peter.Enderborg@xxxxxxxx> wrote: > > On 4/21/21 8:28 PM, Shakeel Butt wrote: > > On Wed, Apr 21, 2021 at 10:06 AM peter enderborg > > <peter.enderborg@xxxxxxxx> wrote: > >> On 4/20/21 3:44 AM, Shakeel Butt wrote: > > [...] > >> I think this is the wrong way to go. > > Which one? Are you talking about the kernel one? We already talked out > > of that. To decide to OOM, we need to look at a very diverse set of > > metrics and it seems like that would be very hard to do flexibly > > inside the kernel. > You dont need to decide to oom, but when oom occurs you > can take a proper action. No, we want the flexibility to decide when to oom-kill. Kernel is very conservative in triggering the oom-kill. > > [...] > > Actually no. It is missing the flexibility to monitor metrics which a > > user care and based on which they decide to trigger oom-kill. Not sure > > how will watchdog replace psi/vmpressure? Userspace keeps petting the > > watchdog does not mean that system is not suffering. > > The userspace should very much do what it do. But when it > does not do what it should do, including kick the WD. Then > the kernel kicks in and kill a pre defined process or as many > as needed until the monitoring can start to kick and have the > control. > Roman already suggested something similar (i.e. oom-killer core and extended and core watching extended) but completely in userspace. I don't see why we would want to do that in the kernel instead. > > > > In addition oom priorities change dynamically and changing it in your > > system seems very hard. Cgroup awareness is missing too. > > Why is that hard? Moving a object in a rb-tree is as good it get. > It is a group of objects. Anyways that is implementation detail. The message I got from this exchange is that we can have a watchdog (userspace or kernel) to further improve the reliability of userspace oom-killers.