Re: blocking write() after disconnecting cifs server

Shirish Pargaonkar <shirishpargaonkar@xxxxxxxxx> · Sat, 18 Jan 2014 22:40:15 -0600

If an address_space flag has AS_EIO bit set, should the subsequent
writes should_fail/start_failing instead of them writing to the page cache?
Also not sure what happens to pages with PG_error bit set, probably
get discarded.

On Thu, Jan 2, 2014 at 1:31 PM, Jeff Layton <jlayton@xxxxxxxxx> wrote:
> On Thu, 2 Jan 2014 17:04:27 +0100 (CET)
> mail654@xxxxxx wrote:
>
>> > > write() from cifs kernel driver blocks when disconnecting the cifs server. The blocking call didn't return after 30 minutes. Client and server are connected via a switch and server's LAN cable is unplugged during the write call. I use kernel 3.11.8 and mounted without "hard" option.
>> > >
>> > > Is there a possibility for an non-blocking write() without using O_SYNC or "directio" mount option?
>> > >
>> > > Way to reproduce the scenario: Below is a sample program which calls write() in a loop. The error messages appear when unplugging the cable during this loop.
>> > >
>> > > Kind regards,
>> > > Hagen
>> > >
>> > > CIFS VFS: sends on sock ffff88003710c280 stuck for 15 seconds
>> > > CIFS VFS: Error -11 sending data on socket to server
>> > >
>> > > #include <fstream>
>> > > #include <iostream>
>> > > int main () {
>> > >   const int size = 100000;
>> > >   char buffer[size];
>> > >   std::ofstream outfile("/mnt/new.bin",std::ofstream::binary);
>> > >   if (!outfile.is_open())
>> > >   {
>> > >     return 1;
>> > >   }
>> > >   for (int idx=0; idx<10000 && outfile.good(); idx++)
>> > >   {
>> > >     outfile.write(buffer,size);
>> > >     std::cout << "written, size=" << size << std::endl;
>> > >   }
>> > >   std::cout << "finished " << outfile.good() << std::endl;
>> > >   outfile.close();
>> > >   return 0;
>> > > }
>> >
>> > A hang of that length is unexpected. If you're able to reproduce this,
>> > can you get the stack from the task issuing the write at the time?
>> >
>> >     $ cat /proc/<pid>/stack
>> >
>> > That might give us a clue as to what it's doing.
>>
>> [<ffffffff8170ab8c>] balance_dirty_pages.isra.19+0x4ac/0x55c
>> [<ffffffff8115455b>] balance_dirty_pages_ratelimited+0xeb/0x110
>> [<ffffffff81148f3a>] generic_perform_write+0x16a/0x210
>> [<ffffffff8114903d>] generic_file_buffered_write+0x5d/0x90
>> [<ffffffff8114aa66>] __generic_file_aio_write+0x1b6/0x3b0
>> [<ffffffff8114acc9>] generic_file_aio_write+0x69/0xd0
>> [<ffffffffa03ef225>] cifs_strict_writev+0xa5/0xd0 [cifs]
>> [<ffffffff811b2b95>] do_sync_readv_writev+0x65/0x90
>> [<ffffffff811b4312>] do_readv_writev+0xd2/0x2b0
>> [<ffffffff811b452c>] vfs_writev+0x3c/0x50
>> [<ffffffff811b46a2>] SyS_writev+0x52/0xc0
>> [<ffffffff8172976f>] tracesys+0xe1/0xe6
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>
> Looks like it's stuck in dirty page throttling.
>
> What's likely happening is that you have a bunch of dirty pages when
> you go to pull the cable. At that point the system is trying to flush
> the pages so that this task can try to dirty more of them.
>
> What *should* happen (at least if this is a soft mount) is that the
> writeback of those pages eventually times out, the pages get their
> error bit set and eventually the write() syscalls go through.
>
> Have you tried stracing this and are able to tell that the write
> syscall never returns in this situation? Is it possible that the
> write() syscalls are returning, albeit slowly?
>
> --
> Jeff Layton <jlayton@xxxxxxxxx>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html