On Tue, Oct 27, 2020 at 09:29:43AM +1100, Dave Chinner wrote: > On Sat, Oct 24, 2020 at 09:41:14PM -0700, Darrick J. Wong wrote: > > On Thu, Oct 22, 2020 at 04:15:31PM +1100, Dave Chinner wrote: > > > @@ -140,6 +164,7 @@ workqueue_add( > > > > > > /* Now queue the new work structure to the work queue. */ > > > pthread_mutex_lock(&wq->lock); > > > +restart: > > > if (wq->next_item == NULL) { > > > assert(wq->item_count == 0); > > > ret = -pthread_cond_signal(&wq->wakeup); > > > @@ -150,6 +175,16 @@ workqueue_add( > > > } > > > wq->next_item = wi; > > > } else { > > > + /* throttle on a full queue if configured */ > > > + if (wq->max_queued && wq->item_count == wq->max_queued) { > > > + pthread_cond_wait(&wq->queue_full, &wq->lock); > > > > I ported xfs_scrub to use max_queued for the inode scanner, and got a > > hang here. It uses two workqueues -- the first is an unbouned workqueue > > that receives one work item per AG in which each work item calls > > INUMBERS, creates a work item for the returned inode chunk, and throws > > it at the second workqueue. The second workqueue is a bounded workqueue > > that calls BULKSTAT on the INUMBERS work item and then calls the > > iteration function on each bulkstat record returned. > > > > The hang happens when the inumbers workqueue has more than one thread > > running. > > IIUC, that means you have multiple producer threads? IIRC, he usage > in this patchset is single producer, so it won't hit this problem... Right, there are multiple producers, because that seemed like fun. At this stage, I'm merely using this patch in anger (as it were) to prototype a fix to scrub. I'm not even sure it's a reasonable enhancement for xfs_scrub since each of those bulkstat workers will then fight with the inumbers workers for the AGI, but so it goes. It does seem to eliminate the problem of one thread working hard on an AG that has one huge fragmented file while the other threads sit idle, but otherwise I mostly just see more buffer lock contention. The total runtime doesn't change on more balanced filesystems, at least. :P > > Both* threads notice the full workqueue and wait on > > queue_full. One of the workers in the second workqueue goes to pull off > > the next work item, ends up in this if body, signals one of the sleeping > > threads, and starts calling bulkstat. > > > > In the time it takes to wake up the sleeping thread from wq 1, the > > second workqueue pulls far enough ahead that the single thread from wq1 > > never manages to fill wq2 again. Often, the wq1 thread was sleeping so > > that it could add the last inode chunk of that AG to wq2. We therefore > > never wake up the *other* sleeping thread from wq1, and the whole app > > stalls. > > > > I dunno if that's a sane way to structure an inumbers/bulkstat scan, but > > it seemed reasonable to me. I can envision two possible fixes here: (1) > > use pthread_cond_broadcast to wake everything up; or (2) always call > > pthread_cond_wait when we pull a work item off the queue. Thoughts? > > pthread_cond_broadcast() makes more sense, but I suspect there will > be other issues with multiple producers that render the throttling > ineffective. I suspect supporting multiple producers should be a > separate patchset... <nod> Making change (2) seems to work for multiple producers, but I guess I'll keep poking at perf to see if I discover anything exciting. --D > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx