Hi! On Fri 17-06-22 09:40:11, Yu Kuai wrote: > 在 2022/06/17 1:21, Jan Kara 写道: > > I've been debugging some customer reports of tasks hanging (forever) > > waiting for free tags when in fact all tags are free. After looking into it > > for some time I think I know what it happening. First, keep in mind that > > it concerns a device which uses shared tags. There are 127 tags available > > and the number of active queues using these tags is easily 40 or more. So > > number of tags available for each device is rather small. Now I'm not sure > > how batched wakeups can ever work in such situations, but maybe I'm missing > > something. > > > > So take for example a situation where two tags are available for a device, > > they are both currently used. Now a process comes into blk_mq_get_tag() and > > wants to allocate tag and goes to sleep. Now how can it ever be woken up if > > wake_batch is 4? If the two IOs complete, sbitmap will get two wakeups but > > that's not enough to trigger the batched wakeup to really wakeup the > > waiter... > > > > Even if we have say 4 tags available so in theory there should be enough > > wakeups to fill the batch, there can be the following problem. So 4 tags > > are in use, two processes come to blk_mq_get_tag() and sleep, one on wait > > queue 0, one on wait queue 1. Now four IOs complete so > > sbitmap_queue_wake_up() gets called 4 times and the fourth call decrements > > wait_cnt to 0 so it ends up calling wake_up_nr(wq0, 4). Fine, one of the > > waiters is woken up but the other one is still sleeping in wq1 and there > > are not enough wakeups to fill the batch and wake it up? This is > > essentially because we have lost three wakeups on wq0 because it didn't > > have enough waiters to wake... > > From what I see, if tags are shared for multiple devices, wake_batch > should make sure that all waiter will be woke up: > > For example: > there are total 64 tags shared for two devices, then wake_batch is 4(if > both devices are active). If there are waiters, which means at least 32 > tags are grabed, thus 8 queues will ensure to wake up at least once > after 32 tags are freed. Well, yes, wake_batch is updated but as my example above shows it is not enough to fix "wasted" wakeups. > > Finally, sbitmap_queue_wake_up() is racy and if two of them race together, > > they can end up decrementing wait_cnt of wq which does not have any process > > queued which again effectively leads to lost wakeups and possibly > > indefinitely sleeping tasks. > > > > BTW, I do this implementation have some problems on concurrent > scenario, as described in following patch: > > https://lore.kernel.org/lkml/20220415101053.554495-4-yukuai3@xxxxxxxxxx/ Yes, as far as I can see you have identified similar races as I point out in this email. But I'm not sure whether your patch fixes all the possibilities for lost wakeups... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR