Re: flush and EIO errors when writepages fails

Jeff Layton <jlayton@xxxxxxxxxx> · Sat, 21 Jun 2008 08:27:42 -0400

On Sat, 21 Jun 2008 11:05:56 +0400
Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:

> Hi.
> 
> On Fri, Jun 20, 2008 at 05:34:21PM -0500, Steve French (smfrench@xxxxxxxxx) wrote:
> > > Right, but with the current implementation, once filemap_fdatawrite
> > > returns, any pages that that run touched are either written out or
> > > discarded.
> 
> Depending on writepages() implementation, it is not always the case.
> 

Right. I meant with the current cifs_writepages() implementation. It
looks like when the write fails we walk the pagevec and do:

                                if (rc)
                                        SetPageError(page);
                                kunmap(page);
                                unlock_page(page);
                                end_page_writeback(page);
                                page_cache_release(page);

So I'm not certain that the data is actually discarded. I'm no expert
in page accounting, but I'm pretty sure that we won't attempt to write
it out again from CIFS.

> > That could explain some problems if true.  When writepages fails, we
> > make the pages as in error (PG_error flag?) and presumably they are
> > still dirty.   Why in the world would anyone free the pages just
> > because we failed the first time and need to write them again later?
> > Do you where (presumably in /mm) pages could be freed that are still
> > dirty (it is hard to find where the PG_error flag is checked etc)?
> 
> You can clear writeback bit but leave/set dirty bit in completion
> callback for given request. You can manually insert page into radix tree
> with dirty tag. You can also lock page and do not allow to unlock it
> until you resent your data.
> 
> So, there is plenty of possibilities to break page accounting in own
> writepages() method :)
> 

I still think that if you get a hard error back from the server then
you're not likely to have much success on a second attempt to write out
the pages. Returning an error to userspace as soon as you realize that
it didn't work seems reasonable to me. A non-responding server, on the
other hand, may be a place that's recoverable.

Either way, if we really want to do a second attempt to write out the
pagevec, then adding some code to cifs_writepages that sleeps for a bit
and implements this seems like the thing to do. I'm not convinced that
it will actually make much difference, but it seems unlikely to hurt
anything.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html