On Tue, Nov 26, 2013 at 08:23:46PM -0800, Ben Widawsky wrote: > On Tue, Nov 26, 2013 at 04:55:50PM -0800, Ben Widawsky wrote: > > If we end up calling the shrinker, which in turn requires the OOM > > killer, we may end up infinitely waiting for a process to die if the OOM > > chooses. The case that this prevents occurs in execbuf. The forked > > variants of gem_evict_everything is a good way to hit it. This is > > exacerbated by Daniel's recent patch to give OOM precedence to the GEM > > tests. > > > > It's a twisted form of a deadlock. > > > > What occurs is the following (assume just 2 procs) > > 1. proc A gets to execbuf while out of memory, gets struct_mutex. > > 2. OOM killer comes in and chooses proc B > > 3. proc B closes it's fds, which requires struct mutex, blocks > > 4, OOM killer waits for B to die before killing another process (this > > part is speculative) > > > > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Signed-off-by: Ben Widawsky <ben@xxxxxxxxxxxx> > > I'd still like to know if I am crazy, but I'm now trying to defer the > stuff we do on file close without using any allocs. Just an update... > workqueue still has similar problems. It could be because deferring the context cleanup means we don't actually free much space, and so the OOM isn't enough, or [more likely] something else is going on. Maybe it's my bug. I am really out of ideas at the moment. The system just becomes unresponsive after all threads end up blocked waiting for struct mutex. I know I'd seen such problems in the past with gem_evict_everything, but this time around I seem to be the sole cause (and not all my machines hit it). Sorry for the noise - just totally burning out on this one. -- Ben Widawsky, Intel Open Source Technology Center _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx