On Wed 17-10-18 19:06:22, Tetsuo Handa wrote: > syzbot is hitting RCU stall at shmem_fault() [1]. > This is because memcg-OOM events with no eligible task (current thread > is marked as OOM-unkillable) continued calling dump_header() from > out_of_memory() enabled by commit 3100dab2aa09dc6e ("mm: memcontrol: > print proper OOM header when no eligible victim left."). > > Michal proposed ratelimiting dump_header() [2]. But I don't think that > that patch is appropriate because that patch does not ratelimit > > "%s invoked oom-killer: gfp_mask=%#x(%pGg), nodemask=%*pbl, order=%d, oom_score_adj=%hd\n" > "Out of memory and no killable processes...\n" > > messages which can be printed for every few milliseconds (i.e. effectively > denial of service for console users) until the OOM situation is solved. > > Let's make sure that next dump_header() waits for at least 60 seconds from > previous "Out of memory and no killable processes..." message. Michal is > thinking that any interval is meaningless without knowing the printk() > throughput. But since printk() is synchronous unless handed over to > somebody else by commit dbdda842fe96f893 ("printk: Add console owner and > waiter logic to load balance console writes"), it is likely that all OOM > messages from this out_of_memory() request is already flushed to consoles > when pr_warn("Out of memory and no killable processes...\n") returned. > Thus, we will be able to allow console users to do what they need to do. > > To summarize, this patch allows threads in requested memcg to complete > memory allocation requests for doing recovery operation, and also allows > administrators to manually do recovery operation from console if > OOM-unkillable thread is failing to solve the OOM situation automatically. Could you explain why this is any better than using a well established ratelimit approach? -- Michal Hocko SUSE Labs