Re: flush and EIO errors when writepages fails

Evgeniy Polyakov <johnpol@xxxxxxxxxxx> · Sat, 21 Jun 2008 17:19:19 +0400

On Sat, Jun 21, 2008 at 08:27:42AM -0400, Jeff Layton (jlayton@xxxxxxxxxx) wrote:
> Right. I meant with the current cifs_writepages() implementation. It
> looks like when the write fails we walk the pagevec and do:
> 
>                                 if (rc)
>                                         SetPageError(page);
>                                 kunmap(page);
>                                 unlock_page(page);
>                                 end_page_writeback(page);
>                                 page_cache_release(page);
> 
> So I'm not certain that the data is actually discarded. I'm no expert
> in page accounting, but I'm pretty sure that we won't attempt to write
> it out again from CIFS.

Yes, current CIFS implementation does not allow to do much of failover
recovery. Data is still in the cache, only page is marked as error, so
nothing prevents from redirtifying it in subsequent write.

> > > That could explain some problems if true.  When writepages fails, we
> > > make the pages as in error (PG_error flag?) and presumably they are
> > > still dirty.   Why in the world would anyone free the pages just
> > > because we failed the first time and need to write them again later?
> > > Do you where (presumably in /mm) pages could be freed that are still
> > > dirty (it is hard to find where the PG_error flag is checked etc)?
> > 
> > You can clear writeback bit but leave/set dirty bit in completion
> > callback for given request. You can manually insert page into radix tree
> > with dirty tag. You can also lock page and do not allow to unlock it
> > until you resent your data.
> > 
> > So, there is plenty of possibilities to break page accounting in own
> > writepages() method :)
> > 
> 
> I still think that if you get a hard error back from the server then
> you're not likely to have much success on a second attempt to write out
> the pages. Returning an error to userspace as soon as you realize that
> it didn't work seems reasonable to me. A non-responding server, on the
> other hand, may be a place that's recoverable.
> 
> Either way, if we really want to do a second attempt to write out the
> pagevec, then adding some code to cifs_writepages that sleeps for a bit
> and implements this seems like the thing to do. I'm not convinced that
> it will actually make much difference, but it seems unlikely to hurt
> anything.

If server returns serious error then there is no other way except to
discard data with error, but if server does not respond or respond EBUSY
or that kind of error, then subsequent write can succeed and at least
should not harm. As a simple case, it is possible to sleep a bit and
resend in writepages(), but it is also possible just to return from the
callback and allow VFS to call it again (frequently it will happen very
soon though) with the same pages (or even increased chunk).

-- 
	Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html