On Tue, Sep 22, 2020 at 10:01 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Tue 22-09-20 09:51:30, Shakeel Butt wrote: > > On Tue, Sep 22, 2020 at 9:34 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > > On Tue 22-09-20 09:29:48, Shakeel Butt wrote: > [...] > > > > Anyways, what do you think of the in-kernel PSI based > > > > oom-kill trigger. I think Johannes had a prototype as well. > > > > > > We have talked about something like that in the past and established > > > that auto tuning for oom killer based on PSI is almost impossible to get > > > right for all potential workloads and that so this belongs to userspace. > > > The kernel's oom killer is there as a last resort when system gets close > > > to meltdown. > > > > The system is already in meltdown state from the users perspective. I > > still think allowing the users to optionally set the oom-kill trigger > > based on PSI makes sense. Something like 'if all processes on the > > system are stuck for 60 sec, trigger oom-killer'. > > We already do have watchdogs for that no? If you cannot really schedule > anything then soft lockup detector should fire. In a meltdown state like > that the reboot is likely the best way forward anyway. Yes, soft lockup detector can catch this situation but I still think we can do better than panic/reboot. Anyways, I think we now know the reason for this extreme pressure and I just wanted to share if someone else might be facing a similar situation. There were several thousand TCP delayed ACKs queued on the system. The system was under memory pressure and alloc_skb(GFP_ATOMIC) for delayed ACKs were either stealing from reclaimers or failing. For the delayed ACKs whose allocation failed, the kernel reschedules them infinitely. So, these failing allocations for delayed ACKs were keeping the system in this lockup state for hours. The commit a37c2134bed6 ("tcp: add exponential backoff in __tcp_send_ack()") recently added the fix for this situation.