On 4/25/21 3:08 AM, Tetsuo Handa wrote: > On 2021/04/25 1:19, peter enderborg wrote: >>> I don't think this proposal is a watchdog. I think this proposal is >>> a timer based process killer, based on an assumption that any slowdown >>> which prevents the monitor process from pinging for more than 0.5 seconds >>> (if HZ == 1000) is caused by memory pressure. >> You missing the point. The oom killer is a example of a work that it can do. >> it is one policy. The idea is that you should have a policy that fits your needs. > Implementing policy which can run in kernel from timer interrupt context is > quite limited, for it is not allowed to perform operations that might sleep. See > > [RFC] memory reserve for userspace oom-killer > https://urldefense.com/v3/__https://lkml.kernel.org/r/CALvZod7vtDxJZtNhn81V=oE-EPOf=4KZB2Bv6Giz*u3bFFyOLg@mail.gmail.com__;Kw!!JmoZiZGBv3RvKRSx!tqBFKAdfydRJ5M0oP4xCRvSscrBwChj5MWuj1YUNAk05uORWkbcz-iodFCHYjKdOytmHoO4$ > > for implementing possibly useful policy. I you need to do a more complex approach you might need to have a work queue. For example a SIGTERM solution might be like that. You send sigterm wait some time and then send a sigkill. >> oom_score_adj is suitable for a android world. But it might be based on >> uid's if your priority is some users over other. Or a memcg. Or as >> Christophe Leroy want the current. The policy is only a example that >> fits a one area. > Horrible idea. Imagine a kernel module that randomly sends SIGTERM/SIGKILL > to "current" thread. How normal systems can survive? A normal system is not > designed to survive random signals. I think you need to see it in the context of a watchdog. It might be problematic, but it has a good statistical change to hit a cpu hogger. And seeing as watchdog, the alternative is a system reset. You take a chance. Reboot should be the last resort. I can imagine a kernel module that randomly sends SIGTERM/SIGKILL, we already have that. It is called oom-kill. This is *exactly* the problem. > >> You need to describe your prioritization, in android it is >> oom_score_adj. For example I would very much have a policy that sends >> sigterm instead of sigkill. > That's because Android framework is designed to survive random signals > (in order to survive memory pressure situation). It using a lot to control the system. It use it differently than you would with a shell or window-manager. > >> But the integration with oom is there because >> it is needed. Maybe a bad choice for political reasons but I don't it a >> good idea to hide the intention. Please don't focus on the oom part. > I wonder what system other than Android framework can utilize this module. I think it will be useful for embedded systems as well. > By the way, there already is "Software Watchdog" ( drivers/watchdog/softdog.c ) > which some people might call it "soft watchdog". It is very confusing to name > your module as "softwatchdog". Please find a different name. > It is mention in the patch-set. I had as an idea to add this function to that one, but I decided that it was better to separate so point out the feature that is to be "Soft" rather than so hard.