On Fri 23-11-18 13:38:38, Daniel Vetter wrote: > On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote: > > On Thu 22-11-18 17:51:05, Daniel Vetter wrote: > > > We need to make sure implementations don't cheat and don't have a > > > possible schedule/blocking point deeply burried where review can't > > > catch it. > > > > > > I'm not sure whether this is the best way to make sure all the > > > might_sleep() callsites trigger, and it's a bit ugly in the code flow. > > > But it gets the job done. > > > > Yeah, it is quite ugly. Especially because it makes DEBUG config > > bahavior much different. So is this really worth it? Has this already > > discovered any existing bug? > > Given that we need an oom trigger to hit this we're not hitting this in CI > (oom is just way to unpredictable to even try). I'd kinda like to also add > some debug interface so I can provoke an oom kill of a specially prepared > process, to make sure we can reliably exercise this path without killing > the kernel accidentally. We do similar tricks for our shrinker already. Create a task with oom_score_adj = 1000 and trigger the oom killer via sysrq and you should get a predictable oom invocation and execution. [...] > Wrt the behavior difference: I guess we could put another counter into the > task struct, and change might_sleep() to check it. All under > CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable > sideeffect. My worry with that is that people will spot it, and abuse it > in creative ways that do affect semantics. See horrors like > drm_can_sleep() (and I'm sure gfx folks are not the only ones who > seriously lacked taste here). > > Up to the experts really how to best paint this shed I think. Actually I like a way to say non_block_{begin,end} and might_sleep firing inside that context. -- Michal Hocko SUSE Labs