> From: "Brian Hawley" <bhawley@xxxxxxxxxxx> > > I ended up writing a "manage_mounts" script run by cron that compares > /proc/mounts and the fstab, used ping, and "timeout" messages in > /var/log/messages to identify filesystems that aren't responding, repeatedly > do umount -f to force i/o errors back to the calling applications; and when > missing mounts (in fstab but not /proc/mounts) but were now pingable, > attempt to remount them. > > > For me, timeo and retrans are necessary, but not sufficient. The chunking to > rsize/wsize and caching plays a role in how well i/o errors get relayed back > to the applications doing the i/o. > > You will certainly lose data in these scenario's. > > It would be fantastic if somehow the timeo and retrans were sufficient (ie > when they fail, i/o errors get back to the applications that queued that i/o > (or even the i/o that cause the application to pend because the rsize/wsize > or cache was full). > > You can eliminate some of that behavior with sync/directio, but performance > becomes abysmal. > > I tried "lazy" it didn't provide the desired effect (they unmounted which > prevented new i/o's; but existing I/o's never got errors). This is the problem I am having - I can unmount the filesystem with -l, but once it is unmounted the existing apache processes are still stuck forever. Does repeatedly running "umount -f" instead of "umount -l" as you describe return I/O errors back to existing processes and allow them to stop? > From: "Jim Rees" <rees@xxxxxxxxx> > Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp > and not try to write anything to nfs. I was using tcp,bg,soft,intr when this problem occurred. I do not know if apache was attempting to do a write or a read, but it seems that tcp,soft,intr was not sufficient to prevent the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html