Aw: Re: blocking write() after disconnecting cifs server

mail654@xxxxxx · Fri, 3 Jan 2014 10:10:32 +0100 (CET)

jlayton@xxxxxxxxx wrote:
> On Thu, 2 Jan 2014 17:04:27 +0100 (CET)
> mail654@xxxxxx wrote:
> 
> > > > write() from cifs kernel driver blocks when disconnecting the cifs server. The blocking call didn't return after 30 minutes. Client and server are connected via a switch and server's LAN cable is unplugged during the write call. I use kernel 3.11.8 and mounted without "hard" option.
> > > > 
> > > > Is there a possibility for an non-blocking write() without using O_SYNC or "directio" mount option?
> > > > 
> > > > Way to reproduce the scenario: Below is a sample program which calls write() in a loop. The error messages appear when unplugging the cable during this loop.
> > > > 
> > > > Kind regards,
> > > > Hagen
> > > > 
> > > > CIFS VFS: sends on sock ffff88003710c280 stuck for 15 seconds
> > > > CIFS VFS: Error -11 sending data on socket to server
> > > > 
> > > > #include <fstream>
> > > > #include <iostream>
> > > > int main () {
> > > >   const int size = 100000;
> > > >   char buffer[size];
> > > >   std::ofstream outfile("/mnt/new.bin",std::ofstream::binary);
> > > >   if (!outfile.is_open())
> > > >   {
> > > >     return 1;
> > > >   }
> > > >   for (int idx=0; idx<10000 && outfile.good(); idx++)
> > > >   {
> > > >     outfile.write(buffer,size);
> > > >     std::cout << "written, size=" << size << std::endl;
> > > >   }
> > > >   std::cout << "finished " << outfile.good() << std::endl;
> > > >   outfile.close();
> > > >   return 0;
> > > > }
> > > 
> > > A hang of that length is unexpected. If you're able to reproduce this,
> > > can you get the stack from the task issuing the write at the time?
> > > 
> > >     $ cat /proc/<pid>/stack
> > > 
> > > That might give us a clue as to what it's doing.
> > 
> > [<ffffffff8170ab8c>] balance_dirty_pages.isra.19+0x4ac/0x55c
> > [<ffffffff8115455b>] balance_dirty_pages_ratelimited+0xeb/0x110
> > [<ffffffff81148f3a>] generic_perform_write+0x16a/0x210
> > [<ffffffff8114903d>] generic_file_buffered_write+0x5d/0x90
> > [<ffffffff8114aa66>] __generic_file_aio_write+0x1b6/0x3b0
> > [<ffffffff8114acc9>] generic_file_aio_write+0x69/0xd0
> > [<ffffffffa03ef225>] cifs_strict_writev+0xa5/0xd0 [cifs]
> > [<ffffffff811b2b95>] do_sync_readv_writev+0x65/0x90
> > [<ffffffff811b4312>] do_readv_writev+0xd2/0x2b0
> > [<ffffffff811b452c>] vfs_writev+0x3c/0x50
> > [<ffffffff811b46a2>] SyS_writev+0x52/0xc0
> > [<ffffffff8172976f>] tracesys+0xe1/0xe6
> > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> 
> Looks like it's stuck in dirty page throttling.
> 
> What's likely happening is that you have a bunch of dirty pages when
> you go to pull the cable. At that point the system is trying to flush
> the pages so that this task can try to dirty more of them.
> 
> What *should* happen (at least if this is a soft mount) is that the
> writeback of those pages eventually times out, the pages get their
> error bit set and eventually the write() syscalls go through.
> 
> Have you tried stracing this and are able to tell that the write
> syscall never returns in this situation? Is it possible that the
> write() syscalls are returning, albeit slowly?

No, during several straces I've never seen a write() syscall returning after
pulling the cable.

Hagen
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html