Re: [PATCH] fs: clear writeback errors in inode_init_always

Jeff Layton <jlayton@xxxxxxxxxx> · Wed, 23 May 2018 06:56:00 -0400

On Wed, 2018-05-23 at 08:09 +1000, Dave Chinner wrote:
> On Tue, May 22, 2018 at 06:30:40AM -0400, Jeff Layton wrote:
> > On Mon, 2018-05-21 at 10:54 -0700, Darrick J. Wong wrote:
> > > On Sat, May 19, 2018 at 08:36:19AM -0700, Matthew Wilcox wrote:
> > > > On Sat, May 19, 2018 at 08:27:00AM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > > 
> > > > > In inode_init_always(), we clear the inode mapping flags, which clears
> > > > > any retained error (AS_EIO, AS_ENOSC) bits.  Unfortunately, we do not
> > > > 
> > > > typo of ENOSPC in case you do a new version
> > > 
> > > Fixed, thanks.
> > > 
> > > > > also clear wb_err, which means that old mapping errors can leak through
> > > > > to new inodes.
> > > > > 
> > > > > This is crucial for the XFS inode allocation path because we recycle old
> > > > > in-core inodes and we do not want error state from an old file to leak
> > > > > into the new file.  This bug was discovered by running generic/036 and
> > > > > generic/047 in a loop and noticing that the EIOs generated by the
> > > > > collision of direct and buffered writes in generic/036 would survive the
> > > > > remount between 036 and 047, and get reported to the fsyncs (on
> > > > > different files on a reformatted fs!) in generic/047.
> > > > > 
> > > > > Since we're changing the semantics of inode_init_always, we must also
> > > > > change xfs_reinit_inode to retain the writeback error state when we go
> > > > > to recover an inode that has been torn down in the vfs but not yet
> > > > > disposed of by XFS.
> > > > 
> > > > Don't you also need to preserve inode->i_mapping->flags across the
> > > > reinitialisation for the users which aren't yet using ->wb_err?
> > > 
> > > At first I thought (mistakenly) that xfs had been completely converted,
> > > but digging further we still use the old filemap_check_errors.  It seems
> > > reasonable to me that if we're going to resurrect an incore inode then
> > > we should try to hang on to AS_EIO/AS_ENOSPC for as long as we can.
> > > 
> > > 
> > 
> > Yes, I think the patch you sent makes sense.
> > 
> > Most of the XFS callers are using filemap_write_and_wait{_range}, and
> > most of those seem to be called from ioctls (->setattr op being the
> > notable exception).
> > 
> > We could (in principle) pass a pointer to file->f_wb_err to most of
> > those callers, and use that to check for errors instead of looking for
> > AS_EIO/AS_ENOSPC. We wouldn't want to advance the error cursor for those
> > (as that should only be done in the context of an fsync), but it might
> > be more reliable than using the flags here.
> 
> Why wouldn't we advance the error pointer? The data writeback error
> caused an operation to fail and the error has been reported to
> userspace, so how is that different to fsync() failing and reporting
> the error to userspace?
> 
> This seems like a slippery slope of inconsistency to me, where data
> writeback errors are only reported once on fsync(), but can be
> reported multiple times through different filesystem operations...
> 

If I get back an error on ioctl, why should I think that that has
anything to do with writeback? For example, suppose I do:

write = 0
ioctl = -EIO
fsync = 0

The ioctl returns an error but the fsync returns 0. It would be
reasonable to assume that my writes had made it to disk.

I think that we really do want to ensure that fsync always returns an
error if writeback failed since the last fsync. That was sort of the
point of the errseq_t work in the first place.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>