On Sun, Mar 08, 2020 at 06:24:24PM -0700, Eric Biggers wrote: > On Mon, Mar 09, 2020 at 10:12:53AM +1100, Dave Chinner wrote: > > On Sat, Mar 07, 2020 at 09:52:21PM -0800, Eric Biggers wrote: > > > From: Eric Biggers <ebiggers@xxxxxxxxxx> > > > > > > When a thread loses the workqueue allocation race in > > > sb_init_dio_done_wq(), lockdep reports that the call to > > > destroy_workqueue() can deadlock waiting for work to complete. This is > > > a false positive since the workqueue is empty. But we shouldn't simply > > > skip the lockdep check for empty workqueues for everyone. > > > > Why not? If the wq is empty, it can't deadlock, so this is a problem > > with the workqueue lockdep annotations, not a problem with code that > > is destroying an empty workqueue. > > Skipping the lockdep check when flushing an empty workqueue would reduce the > ability of lockdep to detect deadlocks when flushing that workqueue. I.e., it > could cause lots of false negatives, since there are many cases where workqueues > are *usually* empty when flushed/destroyed but it's still possible that they are > nonempty. > > > > > > Just avoid this issue by using a mutex to serialize the workqueue > > > allocation. We still keep the preliminary check for ->s_dio_done_wq, so > > > this doesn't affect direct I/O performance. > > > > > > Also fix the preliminary check for ->s_dio_done_wq to use READ_ONCE(), > > > since it's a data race. (That part wasn't actually found by syzbot yet, > > > but it could be detected by KCSAN in the future.) > > > > > > Note: the lockdep false positive could alternatively be fixed by > > > introducing a new function like "destroy_unused_workqueue()" to the > > > workqueue API as previously suggested. But I think it makes sense to > > > avoid the double allocation anyway. > > > > Fix the infrastructure, don't work around it be placing constraints > > on how the callers can use the infrastructure to work around > > problems internal to the infrastructure. > > Well, it's also preferable not to make our debugging tools less effective to > support people doing weird things that they shouldn't really be doing anyway. > > (BTW, we need READ_ONCE() on ->sb_init_dio_done_wq anyway to properly annotate > the data race. That could be split into a separate patch though.) > > Another idea that came up is to make each workqueue_struct track whether work > has been queued on it or not yet, and make flush_workqueue() skip the lockdep > check if the workqueue has always been empty. (That could still cause lockdep > false negatives, but not as many as if we checked if the workqueue is > *currently* empty.) Would you prefer that solution? Adding more overhead to > workqueues would be undesirable though, so I think it would have to be > conditional on CONFIG_LOCKDEP, like (untested): I can't speak for Dave, but if the problem here really is that lockdep's modelling of flush_workqueue()'s behavior could be improved to eliminate false reports, then this seems reasonable to me... --D > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 301db4406bc37..72222c09bcaeb 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -263,6 +263,7 @@ struct workqueue_struct { > char *lock_name; > struct lock_class_key key; > struct lockdep_map lockdep_map; > + bool used; > #endif > char name[WQ_NAME_LEN]; /* I: workqueue name */ > > @@ -1404,6 +1405,9 @@ static void __queue_work(int cpu, struct workqueue_struct *wq, > lockdep_assert_irqs_disabled(); > > debug_work_activate(work); > +#ifdef CONFIG_LOCKDEP > + WRITE_ONCE(wq->used, true); > +#endif > > /* if draining, only works from the same workqueue are allowed */ > if (unlikely(wq->flags & __WQ_DRAINING) && > @@ -2772,8 +2776,12 @@ void flush_workqueue(struct workqueue_struct *wq) > if (WARN_ON(!wq_online)) > return; > > - lock_map_acquire(&wq->lockdep_map); > - lock_map_release(&wq->lockdep_map); > +#ifdef CONFIG_LOCKDEP > + if (READ_ONCE(wq->used)) { > + lock_map_acquire(&wq->lockdep_map); > + lock_map_release(&wq->lockdep_map); > + } > +#endif > > mutex_lock(&wq->mutex);