On Fri, Apr 07, 2017 at 06:48:58PM +0200, Andrea Arcangeli wrote: > On Fri, Apr 07, 2017 at 04:30:11PM +0100, Chris Wilson wrote: > > Not getting hangs is a good sign, but lockdep doesn't like it: > > > > [ 460.684901] WARNING: CPU: 1 PID: 172 at kernel/workqueue.c:2418 check_flush_dependency+0x92/0x130 > > [ 460.684924] workqueue: PF_MEMALLOC task 172(kworker/1:1H) is flushing !WQ_MEM_RECLAIM events:__i915_gem_free_work [i915] > > > > If I allocated the workqueue with WQ_MEM_RELCAIM, it complains bitterly > > as well. > > So in PF_MEMALLOC context we can't flush a workqueue with > !WQ_MEM_RECLAIM. > > system_wq = alloc_workqueue("events", 0, 0); > > My point is synchronize_rcu_expedited will still push its work in > the same system_wq workqueue... > > /* Marshall arguments & schedule the expedited grace period. */ > rew.rew_func = func; > rew.rew_rsp = rsp; > rew.rew_s = s; > INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp); > schedule_work(&rew.rew_work); > > It's also using schedule_work, so either the above is a false > positive, or we've still a problem with synchronize_rcu_expedited, > just a reproducible issue anymore after we stop running it under the > struct_mutex. We still do have a problem with using synchronize_rcu_expedited() from the shrinker as we maybe under someone else's mutex is that involved in its own RCU dance. > Even synchronize_sched will wait on the system_wq if > synchronize_rcu_expedited has been issued in parallel by some other > code, it's just there's no check for it because it's not invoking > flush_work to wait. Right. > The deadlock happens if we flush_work() while holding any lock that > may be taken by any of the workqueues that could be queued there. i915 > makes sure to flush_work only if the struct_mutex was released (not > my initial version) but we're not checking if any of the other > system_wq workqueues could possibly taking a lock that may have been > hold during the allocation that invoked reclaim, I suppose that is the > problem left, but I don't see how flush_work is special about it, > synchronize_rcu_expedited would still have the same issue no? (despite > no lockdep warning) > > I suspect this means synchronize_rcu_expedited() is not usable in > reclaim context and lockdep should warn if PF_MEMALLOC is set when > synchronize_rcu_expedited/synchronize_sched/synchronize_rcu are > called. Yes. > Probably to fix this we should create a private workqueue for both RCU > and i915 and stop sharing the system_wq with the rest of the system > (and of course set WQ_MEM_RECLAIM in both workqueues). This makes sure > when we call synchronize_rcu_expedited; flush_work from the shrinker, > we won't risk waiting on other random work that may be taking locks > that are hold by the code that invoked reclaim during an allocation. We simply do not need to do our own synchronize_rcu* -- it's only used to flush our slab frees on the off chance that (a) we have any and (b) we do manage to free a whole slab. It is not the bulk of the memory that we return to the system from the shrinker. In the other thread, I stated that we should simply remove it. The kswapd reclaim path should try to reclaim RCU slabs (by doing a synhronize_sched or equivalent). > The macro bug of waiting on system_wq 100% of the time while always > holding the struct_mutex is gone, but we need to perfect this further > and stop using the system_wq for RCU and i915 shrinker work. Agreed. My preference is to simply not do it and leave the dangling RCU to the core reclaim paths. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx