Re: Races in sbitmap batched wakeups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 17-06-22 09:07:18, Ming Lei wrote:
> On Thu, Jun 16, 2022 at 07:21:02PM +0200, Jan Kara wrote:
> > Hello!
> > 
> > I've been debugging some customer reports of tasks hanging (forever)
> > waiting for free tags when in fact all tags are free. After looking into it
> > for some time I think I know what it happening. First, keep in mind that
> > it concerns a device which uses shared tags. There are 127 tags available
> > and the number of active queues using these tags is easily 40 or more. So
> > number of tags available for each device is rather small. Now I'm not sure
> > how batched wakeups can ever work in such situations, but maybe I'm missing
> > something.
> > 
> > So take for example a situation where two tags are available for a device,
> > they are both currently used. Now a process comes into blk_mq_get_tag() and
> > wants to allocate tag and goes to sleep. Now how can it ever be woken up if
> > wake_batch is 4? If the two IOs complete, sbitmap will get two wakeups but
> > that's not enough to trigger the batched wakeup to really wakeup the
> > waiter...
> 
> commit 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened")
> is supposed for addressing this kind of issue.

I have observed the deadlock with the above fixes applied.

> > Even if we have say 4 tags available so in theory there should be enough
> > wakeups to fill the batch, there can be the following problem. So 4 tags
> > are in use, two processes come to blk_mq_get_tag() and sleep, one on wait
> > queue 0, one on wait queue 1. Now four IOs complete so
> > sbitmap_queue_wake_up() gets called 4 times and the fourth call decrements
> > wait_cnt to 0 so it ends up calling wake_up_nr(wq0, 4). Fine, one of the
> > waiters is woken up but the other one is still sleeping in wq1 and there
> > are not enough wakeups to fill the batch and wake it up? This is
> > essentially because we have lost three wakeups on wq0 because it didn't
> > have enough waiters to wake...
> 
> But the following completions will wake up the waiter in wq1, given
> there are more in-flight.

Well, there is only one more request in flight - from the unblocked waiter.
And once that request completes, it will generate just one wakeup which is
not enough to wake the waiter on wq1 because wake_batch is 4...

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux