Re: flush and EIO errors when writepages fails

Jeff Layton <jlayton@xxxxxxxxxx> · Fri, 20 Jun 2008 13:12:25 -0400

On Fri, 20 Jun 2008 11:41:26 -0500
"Steve French (smfltc)" <smfltc@xxxxxxxxxx> wrote:

> Jeff Layton wrote:
> > On Fri, 20 Jun 2008 11:19:19 -0500
> > "Steve French (smfltc)" <smfltc@xxxxxxxxxx> wrote:
> >
> >   
> >> If flush fails to write all dirty pages (due to an I/O error on the 
> >> server, server disk or networking stack) today the error (EIO) is marked 
> >> in the inode, and returned on close.   I think cifs_flush (which is 
> >> called before close by the vfs) should also (perhaps after sleep a 
> >> second or so then) retry at least once on the filemap_fdatawrite before 
> >> giving up.  (perhaps retry more if mounted hard) Thoughts?
> >>
> >>     
> >
> > A couple of thoughts...
> >
> > Retrying is only likely to be helpful if the server isn't responding. We
> > could consider doing a better job there somehow.
> >
> >   
> The particular problem case that I am thinking of at the moment, and 
> wish is helped by retry, is
> the case in which memory pressure prevents the TCP/IP stack or 
> underlying (perhaps badly
> written) network adapter driver from allowing the SMB write packet from 
> even getting to
> the wire.

I'll buy that...though I'm not sure how often this is really an issue.
I don't think I've seen any cases of memory allocations failing and
preventing CIFS from writing out pages. I'm also not convinced that
we'd see a lot of success from retrying in these situations.

I have seen problems with NFS/RPC hitting deadlocks when trying to do
sleeping allocations in a write. rpciod tries to do a __GFP_WAIT
allocation, and due to memory pressure ends up trying to write out NFS
pages -- deadlock. A lot of these are now fixed in mainline, but we
still have some less traveled codepaths in NFS/RPC that could deadlock
this way.

We're probably being a bit too optimistic with how we do memory
allocations in CIFS in places, so we may be subject to similar
deadlocks. These types of problems concern me more than retrying failed
page writeouts...

> > If you want to be more aggressive about handling errors when writing
> > out pages, then most of the changes will need to be made at the
> > cifs_writepages level, not so much with cifs_flush.
> >   
> flush is our "last chance" effort to write the file data - once flush 
> and close are called the
> file handle is gone so we can no longer write the file data after that 
> point.

Right, but with the current implementation, once filemap_fdatawrite
returns, any pages that that run touched are either written out or
discarded. Retrying the filemap_fdatawrite from cifs_flush won't help.

This could be redesigned, but we're probably better off fleshing out
cifs_writepages to handle this based on some sort of condition (maybe
the WB_SYNC_* flags or something).

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html