Re: [cifs-protocol] cifs client timeouts and hard/soft mounts

Shirish Pargaonkar <shirishpargaonkar@xxxxxxxxx> · Sat, 4 Dec 2010 08:46:53 -0600

On Sat, Dec 4, 2010 at 8:22 AM, Jeff Layton <jlayton@xxxxxxxxx> wrote:
> On Sat, 4 Dec 2010 08:06:46 -0600
> Shirish Pargaonkar <shirishpargaonkar@xxxxxxxxx> wrote:
>
>> On Sat, Dec 4, 2010 at 7:09 AM, Jeff Layton <jlayton@xxxxxxxxx> wrote:
>> > On Sat, 4 Dec 2010 06:25:07 -0600
>> > Shirish Pargaonkar <shirishpargaonkar@xxxxxxxxx> wrote:
>> >
>> >> On Sat, Dec 4, 2010 at 5:44 AM, Jeff Layton <jlayton@xxxxxxxxx> wrote:
>> >>
>> >> > On Sat, 4 Dec 2010 09:13:21 +0100
>> >> > Volker Lendecke <Volker.Lendecke@xxxxxxxxx> wrote:
>> >> >
>> >> > > On Fri, Dec 03, 2010 at 09:54:13PM -0600, Christopher R. Hertel wrote:
>> >> > > > That may seem to be in the "who cares" category, since those old
>> >> > transports
>> >> > > > are essentially dead (much more dead than NBT, or even NBF).
>> >> >  Unfortunately,
>> >> > > > the code to handle the old transports is still there in Windows, so
>> >> > there
>> >> > > > are behaviors -- things like the timeouts you're talking about and the
>> >> > weird
>> >> > > > VC=0 shutdown behvior -- that exist because of these old disused
>> >> > transports.
>> >> > >
>> >> > > VC=0, how does Windows treat this facing NAT (masquerading)
>> >> > > networks? I've done tests in the past where Windows killed
>> >> > > valid connections from behind a NAT box when a new client
>> >> > > came in.
>> >> > >
>> >> > > Volker
>> >> >
>> >> > It seems like the best way to deal with this on the server side with
>> >> > direct hosted TCP would be to treat VC=0 like any other VC number
>> >> > (MS-CIFS says that this is allowed).
>> >> >
>> >> > Ideally any new connection event from a host however should make the
>> >> > server check the validity of any other connection from the same host.
>> >> > That way you could release resources held by dead connections in case
>> >> > the new one is a reconnect and needs to reclaim state.
>> >> >
>> >> > The question is how to check that validity. Unfortunately, the best you
>> >> > can probably do is rely on TCP keepalives.
>> >> >
>> >> > --
>> >> > Jeff Layton <jlayton@xxxxxxxxx>
>> >> >  --
>> >> > To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
>> >> > the body of a message to majordomo@xxxxxxxxxxxxxxx
>> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >
>> >>
>> >>
>> >> Is SMB Echo command the only way to determine whether to reconnect or not?
>> >> The assumption here is SMB server is unresponsive.
>> >> There could be other circumstances on the server box (or even client box)
>> >> that are
>> >> slowing down the SMB server responses such as slow network, slow network
>> >> stack,
>> >> memory pressure etc.
>> >> So server could be fine all along and yet client would ask for reconnection!
>> >
>> > I think it's the best mechanism that the protocol has. If we aren't
>> > going to use SMB echoes to detect an unresponsive server, then what
>> > would you suggest? I don't think we can make calls wait indefinitely
>> > for a response without a mechanism to determine when the server is gone
>> > and attempt to reestablish the connection to it.
>> >
>> > --
>> > Jeff Layton <jlayton@xxxxxxxxx>
>> >
>>
>> I think we should use smb echo command as a means to let users know the
>> state of mount/server and let them decide.
>> If smb echo times out, cifs client should just log it and stop at that and a
>> very next request that receives a response (if any does) should log that server
>> is responding.
>
> Could you elaborate a bit?
>
> What do you mean by "let them decide" -- what steps would someone take
> if the server stopped responding? Aside from killing the application,
> what recourse would they have?
>
> --
> Jeff Layton <jlayton@xxxxxxxxx>
>

Jeff, I am not sure.  Basically I am coming from here:

I have a bug open, where an SMB server when slow to respond
(for a cifs client), if cifs client reconnects, causes data corruption
on the server. If left to its own, responses from server eventually
make through (without any intervention) and tests pass.

If an SMB server is unresponsive, how do we know it will respond to
a reconnect or a reconnect will help?  I do not know enough about
SMB servers to describe an unresponsive server i.e. how and when
it came to be unresponsive, how it handles transport layer then,
whether it corrects itself or how to correct it, how it handles
underlying physical file sytem etc..
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html