Re: fsync() errors is unsafe and risks data loss

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Wed, 18 Apr 2018 17:40:37 -0700

On Thu, Apr 19, 2018 at 10:13:43AM +1000, Dave Chinner wrote:
> On Fri, Apr 13, 2018 at 07:38:14PM -0700, Matthew Wilcox wrote:
> > On Sat, Apr 14, 2018 at 11:47:52AM +1000, Dave Chinner wrote:
> > > On Fri, Apr 13, 2018 at 07:02:32AM -0700, Matthew Wilcox wrote:
> > > > 1. If we get an error while wbc->for_background is true, we should not clear
> > > >    uptodate on the page, rather SetPageError and SetPageDirty.
> > > 
> > > So you're saying we should treat it as a transient error rather than
> > > a permanent error.
> > 
> > Yes, I'm proposing leaving the data in memory in case the user wants to
> > try writing it somewhere else.
> 
> And if it's getting IO errors because of USB stick pull? What
> then?

I've been thinking about this.  Ideally we want to pass some kind of
notification all the way up to the desktop and tell the user to plug the
damn stick back in.  Then have the USB stick become the same blockdev
that it used to be, and complete the writeback.  We are so far from
being able to do that right now that it's not even funny.

> > > > 2. Background writebacks should skip pages which are PageError.
> > > 
> > > That seems decidedly dodgy in the case where there is a transient
> > > error - it requires a user to specifically run sync to get the data
> > > to disk after the transient error has occurred. Say they don't
> > > notice the problem because it's fleeting and doesn't cause any
> > > obvious problems?
> > 
> > That's fair.  What I want to avoid is triggering the same error every
> > 30 seconds (or whatever the periodic writeback threshold is set to).
> 
> So if kernel ring buffer overflows and so users miss the first error
> report, they'll have no idea that the data writeback is still
> failing?

I wasn't thinking about kernel ringbuffer based reporting; I was thinking
about errseq_t based reporting, so the application can tell the fsync
failed and maybe does something application-level to recover like send
the transactions across to another node in the cluster (or whatever this
hypothetical application is).

> > > > 3. for_sync writebacks should attempt one last write.  Maybe it'll
> > > >    succeed this time.  If it does, just ClearPageError.  If not, we have
> > > >    somebody to report this writeback error to, and ClearPageUptodate.
> > > 
> > > Which may well be unmount. Are we really going to wait until unmount
> > > to report fatal errors?
> > 
> > Goodness, no.  The errors would be immediately reportable using the wb_err
> > mechanism, as soon as the first error was encountered.
> 
> But if there are no open files when the error occurs, that error
> won't get reported to anyone. Which means the next time anyone
> accesses that inode from a user context could very well be unmount
> or a third party sync/syncfs()....

Right.  But then that's on the application.