On Tue, 2018-08-21 at 09:08 -0700, Tejun Heo wrote: > > > -static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr) > > +static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr, > > + bool from_cancel) > > { > > struct worker *worker = NULL; > > struct worker_pool *pool; > > @@ -2885,7 +2886,8 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr) > > * workqueues the deadlock happens when the rescuer stalls, blocking > > * forward progress. > > */ > > - if (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer) { > > + if (!from_cancel && > > + (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer)) { > > lock_map_acquire(&pwq->wq->lockdep_map); > > lock_map_release(&pwq->wq->lockdep_map); > > } > > But this can lead to a deadlock. I'd much rather err on the side of > discouraging complex lock dancing around ordered workqueues, no? What can lead to a deadlock? Writing out the example again, with the unlock now: work1_function() { mutex_lock(&mutex); mutex_unlock(&mutex); } work2_function() { /* nothing */ } other_function() { queue_work(ordered_wq, &work1); queue_work(ordered_wq, &work2); mutex_lock(&mutex); cancel_work_sync(&work2); mutex_unlock(&mutex); } This shouldn't be able to lead to a deadlock like I had explained: > In cancel_work_sync(), we can only have one of two cases, even > with an ordered workqueue: > * the work isn't running, just cancelled before it started > * the work is running, but then nothing else can be on the > workqueue before it johannes