Re: [RFC] Bluetooth: Use flush_work instead of cancel_work

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrei,

On Fri, Jan 13, 2012 at 1:07 PM, Emeltchenko Andrei
<Andrei.Emeltchenko.news@xxxxxxxxx> wrote:
> Hi Marcel,
>
> On Fri, Jan 13, 2012 at 02:18:43PM +0100, Marcel Holtmann wrote:
>> Hi Andrei,
>>
>> > Fix deadlock when cancelling delayed work.
>> >
>> > [  584.676126] ======================================================
>> > [  584.676126] [ INFO: possible circular locking dependency detected ]
>> > [  584.676126] 3.2.0-rc2niko+ #44
>> > [  584.676126] -------------------------------------------------------
>> > [  584.676126] kworker/u:1/30 is trying to acquire lock:
>> > [  584.676126]  (&hdev->lock){+.+.+.}, at: [<f81f001c>] hci_conn_timeout+0x6c/0x190 [bluetooth]
>> > [  584.676126]
>> > [  584.676126] but task is already holding lock:
>> > [  584.676126]  ((&(&conn->disc_work)->work)){+.+...}, at: [<c1065a78>] process_one_work+0x108/0x460
>> > [  584.676126]
>> > [  584.676126] which lock already depends on the new lock.
>> > [  584.676126]
>> > [  584.676126]
>> > [  584.676126] the existing dependency chain (in reverse order) is:
>> > [  584.676126]
>> > [  584.676126] -> #1 ((&(&conn->disc_work)->work)){+.+...}:
>> > [  584.676126]        [<c1086748>] lock_acquire+0x88/0x110
>> > [  584.676126]        [<c1066041>] wait_on_work+0x61/0x210
>> > [  584.676126]        [<c106630a>] __cancel_work_timer+0x6a/0x110
>> > [  584.676126]        [<c10663c0>] cancel_delayed_work_sync+0x10/0x20
>> > [  584.676126]        [<f81f935b>] hci_event_packet+0x3b2b/0x4610 [bluetooth]
>> > [  584.676126]        [<f81ea78e>] hci_rx_work+0x20e/0x4c0 [bluetooth]
>> > [  584.676126]        [<c1065aec>] process_one_work+0x17c/0x460
>> > [  584.676126]        [<c10672e4>] worker_thread+0x124/0x2c0
>> > [  584.676126]        [<c106be44>] kthread+0x84/0x90
>> > [  584.676126]        [<c1567f42>] kernel_thread_helper+0x6/0x10
>> > [  584.676126]
>> > [  584.676126] -> #0 (&hdev->lock){+.+.+.}:
>> > [  584.676126]        [<c10852cd>] __lock_acquire+0xc0d/0x1ab0
>> > [  584.676126]        [<c1086748>] lock_acquire+0x88/0x110
>> > [  584.676126]        [<c155de50>] mutex_lock_nested+0x70/0x320
>> > [  584.676126]        [<f81f001c>] hci_conn_timeout+0x6c/0x190 [bluetooth]
>> > [  584.676126]        [<c1065aec>] process_one_work+0x17c/0x460
>> > [  584.676126]        [<c10672e4>] worker_thread+0x124/0x2c0
>> > [  584.676126]        [<c106be44>] kthread+0x84/0x90
>> > [  584.676126]        [<c1567f42>] kernel_thread_helper+0x6/0x10
>> > [  584.676126]
>> > [  584.676126] other info that might help us debug this:
>> > [  584.676126]
>> > [  584.676126]  Possible unsafe locking scenario:
>> > [  584.676126]
>> > [  584.676126]        CPU0                    CPU1
>> > [  584.676126]        ----                    ----
>> > [  584.676126]   lock((&(&conn->disc_work)->work));
>> > [  584.676126]                                lock(&hdev->lock);
>> > [  584.676126]                                lock((&(&conn->disc_work)->work));
>> > [  584.676126]   lock(&hdev->lock);
>> > [  584.676126]
>> > [  584.676126]  *** DEADLOCK ***
>> > [  584.676126]
>> > [  584.676126] 2 locks held by kworker/u:1/30:
>> > [  584.676126]  #0:  (hdev->name){.+.+.+}, at: [<c1065a78>] process_one_work+0x108/0x460
>> > [  584.676126]  #1:  ((&(&conn->disc_work)->work)){+.+...}, at: [<c1065a78>] process_one_work+0x108/0x460
>> >
>> > Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@xxxxxxxxx>
>> > ---
>> >  net/bluetooth/hci_conn.c |    2 +-
>> >  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> Acked-by: Marcel Holtmann <marcel@xxxxxxxxxxxx>
>
> Apparently it does not fix it completely, the reason might be hci_dev_lock
> in hci_conn_timeout. Maybe instead of lock we could use hold/put?
>
> I will investigate this issue further.

I believe the real problem is to have a _sync() call for cancelling or
flushing the delayed work. While you are at it can you try if using
__cancel_delayed_work() fixes the problem, please? I guess with the
recent move to workqueues some _sync() calls were added that might
lead to deadlocks. I'm even thinking we might wanna replace all
cancel_dealyed_work_sync() calls with __cancel_dealyed_work() ones
would be good. I don't have time to test now, but if you do, please
try this too. Thanks a lot.

> Best regards
> Andrei Emeltchenko

Best regards,

-- 
Ulisses Furquim
ProFUSION embedded systems
http://profusion.mobi
Mobile: +55 19 9250 0942
Skype: ulissesffs
--
To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux