On Fri, 20 Jun 2008 11:41:26 -0500 "Steve French (smfltc)" <smfltc@xxxxxxxxxx> wrote: > Jeff Layton wrote: > > On Fri, 20 Jun 2008 11:19:19 -0500 > > "Steve French (smfltc)" <smfltc@xxxxxxxxxx> wrote: > > > > > >> If flush fails to write all dirty pages (due to an I/O error on the > >> server, server disk or networking stack) today the error (EIO) is marked > >> in the inode, and returned on close. I think cifs_flush (which is > >> called before close by the vfs) should also (perhaps after sleep a > >> second or so then) retry at least once on the filemap_fdatawrite before > >> giving up. (perhaps retry more if mounted hard) Thoughts? > >> > >> > > > > A couple of thoughts... > > > > Retrying is only likely to be helpful if the server isn't responding. We > > could consider doing a better job there somehow. > > > > > The particular problem case that I am thinking of at the moment, and > wish is helped by retry, is > the case in which memory pressure prevents the TCP/IP stack or > underlying (perhaps badly > written) network adapter driver from allowing the SMB write packet from > even getting to > the wire. I'll buy that...though I'm not sure how often this is really an issue. I don't think I've seen any cases of memory allocations failing and preventing CIFS from writing out pages. I'm also not convinced that we'd see a lot of success from retrying in these situations. I have seen problems with NFS/RPC hitting deadlocks when trying to do sleeping allocations in a write. rpciod tries to do a __GFP_WAIT allocation, and due to memory pressure ends up trying to write out NFS pages -- deadlock. A lot of these are now fixed in mainline, but we still have some less traveled codepaths in NFS/RPC that could deadlock this way. We're probably being a bit too optimistic with how we do memory allocations in CIFS in places, so we may be subject to similar deadlocks. These types of problems concern me more than retrying failed page writeouts... > > If you want to be more aggressive about handling errors when writing > > out pages, then most of the changes will need to be made at the > > cifs_writepages level, not so much with cifs_flush. > > > flush is our "last chance" effort to write the file data - once flush > and close are called the > file handle is gone so we can no longer write the file data after that > point. Right, but with the current implementation, once filemap_fdatawrite returns, any pages that that run touched are either written out or discarded. Retrying the filemap_fdatawrite from cifs_flush won't help. This could be redesigned, but we're probably better off fleshing out cifs_writepages to handle this based on some sort of condition (maybe the WB_SYNC_* flags or something). -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html