On Tue 17-03-20 12:00:45, Ami Fischman wrote: > On Tue, Mar 17, 2020 at 11:26 AM Robert Kolchmeyer > <rkolchmeyer@xxxxxxxxxx> wrote: > > > > On Tue, Mar 10, 2020 at 3:54 PM David Rientjes <rientjes@xxxxxxxxxx> wrote: > > > > > > Robert, could you elaborate on the user-visible effects of this issue that > > > caused it to initially get reported? > > > > Ami (now cc'ed) knows more, but here is my understanding. > > Robert's description of the mechanics we observed is accurate. > > We discovered this regression in the oom-killer's behavior when > attempting to upgrade our system. The fraction of the system that > went unhealthy due to this issue was approximately equal to the > _sum_ of all other causes of unhealth, which are many and varied, > but each of which contribute only a small amount of > unhealth. This issue forced a rollback to the previous kernel > where we ~never see this behavior, returning our unhealth levels > to the previous background levels. Could you be more specific on the good vs. bad kernel versions? Because I do not remember any oom changes that would affect the time-to-check-time-to-kill race. The timing might be slightly different in each kernel version of course. -- Michal Hocko SUSE Labs