On Thu, Aug 15, 2019 at 03:12:11PM +0200, Daniel Vetter wrote: > On Thu, Aug 15, 2019 at 3:04 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > On Thu, Aug 15, 2019 at 10:44:29AM +0200, Michal Hocko wrote: > > > > > As the oom reaper is the primary guarantee of the oom handling forward > > > progress it cannot be blocked on anything that might depend on blockable > > > memory allocations. These are not really easy to track because they > > > might be indirect - e.g. notifier blocks on a lock which other context > > > holds while allocating memory or waiting for a flusher that needs memory > > > to perform its work. > > > > But lockdep *does* track all this and fs_reclaim_acquire() was created > > to solve exactly this problem. > > > > fs_reclaim is a lock and it flows through all the usual lockdep > > schemes like any other lock. Any time the page allocator wants to do > > something the would deadlock with reclaim it takes the lock. > > > > Failure is expressed by a deadlock cycle in the lockdep map, and > > lockdep can handle arbitary complexity through layers of locks, work > > queues, threads, etc. > > > > What is missing? > > Lockdep doens't seen everything by far. E.g. a wait_event will be > caught by the annotations here, but not by lockdep. Sure, but the wait_event might be OK if its progress isn't contingent on fs_reclaim, ie triggered from interrupt, so why ban it? > And since we're talking about mmu notifiers here and gpus/dma engines. > We have dma_fence_wait, which can wait for any hw/driver in the system > that takes part in shared/zero-copy buffer processing. Which at least > on the graphics side is everything. This pulls in enormous amounts of > deadlock potential that lockdep simply is blind about and will never > see. It seems very risky to entagle a notifier widely like that. It looks pretty sure that notifiers are fs_reclaim, so at a minimum that wait_event can't be contingent on anything that is doing GFP_KERNEL or it will deadlock - and blockable doesn't make that sleep safe. Avoiding an uncertain wait_event under notifiers would seem to be the only reasonable design here.. Jason