Deadlock between cifs_writev_requeue and cifs_writepages

Pavel Shilovsky <piastryyy@xxxxxxxxx> · Mon, 4 Nov 2019 18:11:39 -0800

пн, 28 окт. 2019 г. в 15:51, Pavel Shilovsky <piastryyy@xxxxxxxxx>:
>
> пн, 28 окт. 2019 г. в 05:13, Frank Sorenson <sorenson@xxxxxxxxxx>:
> >
> > On 10/26/19 4:04 PM, Ronnie Sahlberg wrote:
> > > Steve, Pavel, Frank
> > >
> > > This patch moves the logic in cifsFileInfo_put() into a work-queue so that
> > > it will be processed in a different thread which, importantly, does not hold
> > > any other locks.
> > >
> > > This should address the deadlock that Frank reported in the thread:
> > > Yet another hang after network disruption or smb restart -- multiple file writers
> >
> > Pavel,
> >
> > Thanks for understanding my report and translating it into what it meant.
> >
> > Ronnie,
> > Thanks for the patch.  I'm happy to say that my attempt at the same
> > looks like it would have been similar, had I known what I was doing to
> > begin with :)
> >
> >
> > Unfortunately, I think the patch really only moves the problem from one
> > place to another.  I'm guessing that means we have a second underlying
> > deadlock.  (this reproducer appears to be quite effective)
> >
> >
> > On the plus side, only the one file involved is jammed in the deadlock
> > now, as opposed to the parent directory as well, so it is at least progress.
> >
>
> Hi Frank, Ronnie,
>
> Ok, it seems there is another bug resulting in a deadlock between a
> page lock and writeback (WB) flag in cifs_writev_complete and
> cifs_writepages.
>
> 1) cifs_writev_complete. in the case of WB_SYNC_ALL and EAGAIN
> (reconnect) it calls cifs_writev_requeue. At this point it still has
> WB flag set for all the pages being retried. Inside
> cifs_writev_requeue it tries to lock the page:
>
> kworker/4:3+cifsiod
> #0 __schedule at ffffffffab4f1a45
> #1 schedule at ffffffffab4f1ea0
> #2 io_schedule at ffffffffab4f22a2
> #3 __lock_page at ffffffffaae3f66f
> #4 cifs_writev_complete at ffffffffc07df762
> #5 process_one_work at ffffffffaacecc11
> #6 worker_thread at ffffffffaaced0f0
> #7 kthread at ffffffffaacf2ac2
> #8 ret_from_fork at ffffffffab600215
>
> 2) cifs_writepages. It locks the page and then waits on WB flag to be
> cleared before processing pages:
>
> 1 openloop.sh
> #0 __schedule at ffffffffab4f1a45
> #1 schedule at ffffffffab4f1ea0
> #2 io_schedule at ffffffffab4f22a2
> #3 wait_on_page_bit at ffffffffaae3fbc6
> #4 cifs_writepages at ffffffffc07ffc73
> #5 do_writepages at ffffffffaae4ba71
> #6 __filemap_fdatawrite_range at ffffffffaae4321b
> #7 filemap_write_and_wait at ffffffffaae432aa
> #8 cifs_flush at ffffffffc0800fe3
> #9 filp_close at ffffffffaaeed181
> #10 do_dup2 at ffffffffaaf12c0b
> #11 __x64_sys_dup2 at ffffffffaaf1306a
> #12 do_syscall_64 at ffffffffaac04345
>
> The bug is in cifs_writev_complete and cifs_writev_requeue: the 1st
> needs to clear the WB flag and re-dirty pages before trying to retry
> the IO; the 2nd needs additionally to skip pages that are no longer
> dirty (e.g. written by another thread) and re-set the WB flag before
> sending pages over the wire.
>
> --
> Best regards,
> Pavel Shilovsky

(re-titling and cc'ing Steve and Jeff to comment on this)

I looked at other file systems in the kernel. Different modules do
writepages differently, e.g. CephFS doesn't unlock pages after sending
and only does that once the IO completes. We may probably do the same:
postpone unlocking pages until we have successfully written them to
the server. In this case the retry code path (cifs_writev_requeue)
won't need to lock the pages again and thus the deadlock will be
avoided.

Thoughts?

--
Best regards,
Pavel Shilovsky