On Sat, 4 Dec 2010 08:46:53 -0600 Shirish Pargaonkar <shirishpargaonkar@xxxxxxxxx> wrote: > > Jeff, I am not sure. Basically I am coming from here: > > I have a bug open, where an SMB server when slow to respond > (for a cifs client), if cifs client reconnects, causes data corruption > on the server. If left to its own, responses from server eventually > make through (without any intervention) and tests pass. > I have a very similar bug open and that's what prompted me to go down this road. You may want to test the patchset I proposed for cifs against your reproducer. > If an SMB server is unresponsive, how do we know it will respond to > a reconnect or a reconnect will help? > > I do not know enough about > SMB servers to describe an unresponsive server i.e. how and when > it came to be unresponsive, how it handles transport layer then, > whether it corrects itself or how to correct it, how it handles > underlying physical file sytem etc.. A reconnect may not help. The problem we have today however is that Linux CIFS client is too cavalier with reconnects. It reconnects the socket any time that a call has taken longer than an arbitrary timeout. It tries to deal with that by varying timeouts with the type of call, but I think that's a broken model that fails in many situations. It's impossible to predict how long it'll take the server to service a particular call, as we can never be sure what the load on the server and underlying storage is. A QPathInfo call may take just as long as a write past EOF if the storage is being hammered. The scheme I'm proposing makes the assumption that even when the server is loaded, it'll still be able to respond to an echo. That may also fail in certain situations, but empirical evidence has shown that that it's generally true. This scheme won't fix every failure scenario, but it should help the vast majority of situations where the server is simply being slow to respond to a particular call. I'm not opposed to what you're proposing, but it seems like a more radical step than what I have proposed. We'd need to understand what recourse the user would have in practice and what the behavior will be in various failure scenarios. Leaving the processes hung and logging a message when the server isn't responding isn't going to be very helpful if there's nothing that can be done about it. -- Jeff Layton <jlayton@xxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html