On 10/05/2010 09:53:40 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > Retrying indefinitely is actually the correct behavior I believe. If > > I'm writing or reading files to/from CIFS, then the last thing I want > > is for the kernel to corrupt that data or start returning errors just > > because the network is having problems. Yes, I can see your point of view regarding trying indefinitely. Indeed it would be generally a bad thing if a connection gave up in the middle of an operation where it otherwise would not have; however, I believe this violates basic posix semantics of not letting userspace decide what to do when a resource is deemed unavailable. For example, if I had a non-system harddrive in the system and that drive failed for some reason, I would expect something like EIO to be returned after a period of time. Even more relevant are posix socket semantics of returning errors for various connection failures, like ETIMEDOUT, ENOTCONN, and ECONNRESET. > > However, none of this should cause the client to start returning errors > > to userspace. That doesn't make for robustness in the face of network > > partitions. If, however, the processes waiting on syscalls are > > interrupted with a SIGKILL, we probably ought to return an error to > > userspace (probably an EINTR). Yes, that does remind me that those processes that are deadlocked cannot be killed and that includes attempts to unmount the share. This, in effect, creates what I call the ten-car-pile-up whereby processes just start piling up in the 'D' state like wreck on a busy highway. Because of these behaviors, I think not returning errors to userspace is actually detrimental to robustness because it can leave the system in an unrecoverable state (especially an embedded system). Take a photo viewer app for instance. If there was an embedded device (digital photo frame) that would mount a cifs share as read-only and create a slide show of the jpegs, would you expect to have to power down the device (due to the app stuck in a stat() or read()) because someone closed the lid on the laptop (that hosted the share) and it went to sleep? Besides, if a laptop goes into standby, chances are that the network card has been powered down and the tcp stack torn down anyway. > Note that the above is just my opinion on the matter... I'm open to > suggestions and other opinions on how it should behave. I appreciate you taking the time to have this discussion with me. Hopefully, other cifs developers will chime in so that we can get a good exchange of ideas going. > One of the main problems with CIFS in this and other matters has been a > lack of clarity on what the behavior should be. Having a clear > behavioral goal in mind for the code before we embark on changes is a > necessity I think. I think you hit the nail on the head here. I've found that most of the roadblocks we face as engineers involve not being on the same page. If I had to choose, I would go with socket semantics for networked file systems, although the issue is complicated by write-backs, data consistency, inode caches, etc. >From a userspace point of view, I would expect a read() or write() to return ECONNRESET upon having the connection to the server unrecoverably severed. I don't know how cifs handles the non-blocking case but that should also be spec'd out. Now for things like stat() , either EIO or ENOENT would be fitting I suppose. I'll put the idea on to simmer while I take care of setting up a build environment for a new product. More to follow. Regards, David -- David Kondrad Software Design Engineer Home Systems Division Legrand, North America 717.546.5442 david.kondrad@xxxxxxxxxx www.legrand.us/onq This email, and any document attached hereto, may contain confidential and/or privileged information. If you are not the intended recipient (or have received this email in error) please notify the sender immediately and destroy this email. Any unauthorized, direct or indirect, copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden. -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html