Re: [PATCHv3] sbitmap: fix batched wait_cnt accounting

Keith Busch <kbusch@xxxxxxxxxx> · Tue, 6 Sep 2022 15:48:13 -0600

On Sun, Sep 04, 2022 at 06:39:14AM -0600, Jens Axboe wrote:
> On 9/1/22 10:43 AM, Jens Axboe wrote:
> > On Thu, 25 Aug 2022 07:53:12 -0700, Keith Busch wrote:
> >> From: Keith Busch <kbusch@xxxxxxxxxx>
> >>
> >> Batched completions can clear multiple bits, but we're only decrementing
> >> the wait_cnt by one each time. This can cause waiters to never be woken,
> >> stalling IO. Use the batched count instead.
> >>
> >>
> >> [...]
> > 
> > Applied, thanks!
> > 
> > [1/1] sbitmap: fix batched wait_cnt accounting
> >       commit: 16ede66973c84f890c03584f79158dd5b2d725f5
> 
> This is causing CPU stalls for me running make -j256 with the source
> hosted on an ATA device with QD=32. It's not running with a scheduler.
> It just goes spammy on most/all CPUs so  hard to get a real trace out of
> it, but it looks like we're just looping forever off
> sbitmap_queue_wake_up().
> 
> I'm going to revert this one for now until we can investigate what is
> going on here.

I was able to reproduce this without much trouble. I think it needs to restore
the wait_cnt if we're racing with wait_active. I think the problem even exists
without this patch ([1]), but you'd be unlikely to hit it decrementing wait_cnt
just one at a time when the wait_batch is > 1. The diff on top of this patch
should fix it:

---
-	if (!waitqueue_active(&ws->wait))
+	if (!waitqueue_active(&ws->wait)) {
+		atomic_add(nr, &ws->wait_cnt);
		return true;
+	}
--

[1] https://lore.kernel.org/linux-block/Yxe7V3yfBcADoYLE@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#t