Re: [RFC PATCH] cifs: Fix possible deadlock with cifs and work queues

Pavel Shilovsky <piastry@xxxxxxxxxxx> · Fri, 21 Mar 2014 12:32:12 +0400

2014-03-21 6:23 GMT+04:00 Steven Rostedt <rostedt@xxxxxxxxxxx>:
> On Thu, 20 Mar 2014 17:02:39 -0400
> Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
>> Eventually the server should just allow the read to complete even if
>> the client doesn't respond to the oplock break. It has to since clients
>> can suddenly drop off the net while holding an oplock. That should
>> allow everything to unwedge eventually (though it may take a while).
>>
>> If that's not happening then I'd be curious as to why...
>
> The problem is that the data is being filled in the page and the reader
> is waiting for the page lock to be released. The kworker for the reader
> will issue the complete() and unlock the page to wake up the reader.
>
> But because the other workqueue callback calls down_read(), and there
> can be a down_write() waiting for the reader to finish, this
> down_read() will block on the lock as well (rwsems are fair locks).
> This blocks the other workqueue callback from issuing the complete and
> page_unlock() that will wake up the reader that is holding the rwsem
> with down_read().
>
> DEADLOCK.

Thank you for reporting and clarifying the issue!

Read and write codepaths both obtain lock_sem for read and then wait
for cifsiod_wq to complete and release lock_sem. They don't do any
lock_sem operations inside their work task queued to cifsiod_wq. But
oplock code can obtain/release lock_sem in its work task. So, that's
why I agree with Jeff and suggest to move the oplock code to a
different work queue (cifsioopd_wq?) but leave read and write
codepaths use cifsiod_wq.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html