Re: blocking write() after disconnecting cifs server

Jeff Layton <jlayton@xxxxxxxxx> · Thu, 2 Jan 2014 14:31:57 -0500

On Thu, 2 Jan 2014 17:04:27 +0100 (CET)
mail654@xxxxxx wrote:

> > > write() from cifs kernel driver blocks when disconnecting the cifs server. The blocking call didn't return after 30 minutes. Client and server are connected via a switch and server's LAN cable is unplugged during the write call. I use kernel 3.11.8 and mounted without "hard" option.
> > > 
> > > Is there a possibility for an non-blocking write() without using O_SYNC or "directio" mount option?
> > > 
> > > Way to reproduce the scenario: Below is a sample program which calls write() in a loop. The error messages appear when unplugging the cable during this loop.
> > > 
> > > Kind regards,
> > > Hagen
> > > 
> > > CIFS VFS: sends on sock ffff88003710c280 stuck for 15 seconds
> > > CIFS VFS: Error -11 sending data on socket to server
> > > 
> > > #include <fstream>
> > > #include <iostream>
> > > int main () {
> > >   const int size = 100000;
> > >   char buffer[size];
> > >   std::ofstream outfile("/mnt/new.bin",std::ofstream::binary);
> > >   if (!outfile.is_open())
> > >   {
> > >     return 1;
> > >   }
> > >   for (int idx=0; idx<10000 && outfile.good(); idx++)
> > >   {
> > >     outfile.write(buffer,size);
> > >     std::cout << "written, size=" << size << std::endl;
> > >   }
> > >   std::cout << "finished " << outfile.good() << std::endl;
> > >   outfile.close();
> > >   return 0;
> > > }
> > 
> > A hang of that length is unexpected. If you're able to reproduce this,
> > can you get the stack from the task issuing the write at the time?
> > 
> >     $ cat /proc/<pid>/stack
> > 
> > That might give us a clue as to what it's doing.
> 
> [<ffffffff8170ab8c>] balance_dirty_pages.isra.19+0x4ac/0x55c
> [<ffffffff8115455b>] balance_dirty_pages_ratelimited+0xeb/0x110
> [<ffffffff81148f3a>] generic_perform_write+0x16a/0x210
> [<ffffffff8114903d>] generic_file_buffered_write+0x5d/0x90
> [<ffffffff8114aa66>] __generic_file_aio_write+0x1b6/0x3b0
> [<ffffffff8114acc9>] generic_file_aio_write+0x69/0xd0
> [<ffffffffa03ef225>] cifs_strict_writev+0xa5/0xd0 [cifs]
> [<ffffffff811b2b95>] do_sync_readv_writev+0x65/0x90
> [<ffffffff811b4312>] do_readv_writev+0xd2/0x2b0
> [<ffffffff811b452c>] vfs_writev+0x3c/0x50
> [<ffffffff811b46a2>] SyS_writev+0x52/0xc0
> [<ffffffff8172976f>] tracesys+0xe1/0xe6
> [<ffffffffffffffff>] 0xffffffffffffffff
> 

Looks like it's stuck in dirty page throttling.

What's likely happening is that you have a bunch of dirty pages when
you go to pull the cable. At that point the system is trying to flush
the pages so that this task can try to dirty more of them.

What *should* happen (at least if this is a soft mount) is that the
writeback of those pages eventually times out, the pages get their
error bit set and eventually the write() syscalls go through.

Have you tried stracing this and are able to tell that the write
syscall never returns in this situation? Is it possible that the
write() syscalls are returning, albeit slowly?

-- 
Jeff Layton <jlayton@xxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html