Re: Races in sbitmap batched wakeups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2022/06/17 19:31, Jan Kara 写道:
Hi!

On Fri 17-06-22 09:40:11, Yu Kuai wrote:
在 2022/06/17 1:21, Jan Kara 写道:
I've been debugging some customer reports of tasks hanging (forever)
waiting for free tags when in fact all tags are free. After looking into it
for some time I think I know what it happening. First, keep in mind that
it concerns a device which uses shared tags. There are 127 tags available
and the number of active queues using these tags is easily 40 or more. So
number of tags available for each device is rather small. Now I'm not sure
how batched wakeups can ever work in such situations, but maybe I'm missing
something.

So take for example a situation where two tags are available for a device,
they are both currently used. Now a process comes into blk_mq_get_tag() and
wants to allocate tag and goes to sleep. Now how can it ever be woken up if
wake_batch is 4? If the two IOs complete, sbitmap will get two wakeups but
that's not enough to trigger the batched wakeup to really wakeup the
waiter...

Even if we have say 4 tags available so in theory there should be enough
wakeups to fill the batch, there can be the following problem. So 4 tags
are in use, two processes come to blk_mq_get_tag() and sleep, one on wait
queue 0, one on wait queue 1. Now four IOs complete so
sbitmap_queue_wake_up() gets called 4 times and the fourth call decrements
wait_cnt to 0 so it ends up calling wake_up_nr(wq0, 4). Fine, one of the
waiters is woken up but the other one is still sleeping in wq1 and there
are not enough wakeups to fill the batch and wake it up? This is
essentially because we have lost three wakeups on wq0 because it didn't
have enough waiters to wake...

 From what I see, if tags are shared for multiple devices, wake_batch
should make sure that all waiter will be woke up:

For example:
there are total 64 tags shared for two devices, then wake_batch is 4(if
both devices are active).  If there are waiters, which means at least 32
tags are grabed, thus 8 queues will ensure to wake up at least once
after 32 tags are freed.

Well, yes, wake_batch is updated but as my example above shows it is not
enough to fix "wasted" wakeups.

Tags can be preempted, which means new thread can be added to waitqueue
only if there are no free tags.

With the above condition, I can't think of any possibility how the
following scenario can be existed(dispite the wake ups can be missed):

Only wake_batch tags are still in use, while multiple waitqueues are
still active.

If you think this is possible, can you share the initial conditions and
how does it end up to the problematic scenario?

Thanks,
Kuai

Finally, sbitmap_queue_wake_up() is racy and if two of them race together,
they can end up decrementing wait_cnt of wq which does not have any process
queued which again effectively leads to lost wakeups and possibly
indefinitely sleeping tasks.


BTW, I do this implementation have some problems on concurrent
scenario, as described in following patch:

https://lore.kernel.org/lkml/20220415101053.554495-4-yukuai3@xxxxxxxxxx/

Yes, as far as I can see you have identified similar races as I point out
in this email. But I'm not sure whether your patch fixes all the
possibilities for lost wakeups...

								Honza




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux