On Wed, 2018-05-23 at 08:09 +1000, Dave Chinner wrote: > On Tue, May 22, 2018 at 06:30:40AM -0400, Jeff Layton wrote: > > On Mon, 2018-05-21 at 10:54 -0700, Darrick J. Wong wrote: > > > On Sat, May 19, 2018 at 08:36:19AM -0700, Matthew Wilcox wrote: > > > > On Sat, May 19, 2018 at 08:27:00AM -0700, Darrick J. Wong wrote: > > > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > > > > > > In inode_init_always(), we clear the inode mapping flags, which clears > > > > > any retained error (AS_EIO, AS_ENOSC) bits. Unfortunately, we do not > > > > > > > > typo of ENOSPC in case you do a new version > > > > > > Fixed, thanks. > > > > > > > > also clear wb_err, which means that old mapping errors can leak through > > > > > to new inodes. > > > > > > > > > > This is crucial for the XFS inode allocation path because we recycle old > > > > > in-core inodes and we do not want error state from an old file to leak > > > > > into the new file. This bug was discovered by running generic/036 and > > > > > generic/047 in a loop and noticing that the EIOs generated by the > > > > > collision of direct and buffered writes in generic/036 would survive the > > > > > remount between 036 and 047, and get reported to the fsyncs (on > > > > > different files on a reformatted fs!) in generic/047. > > > > > > > > > > Since we're changing the semantics of inode_init_always, we must also > > > > > change xfs_reinit_inode to retain the writeback error state when we go > > > > > to recover an inode that has been torn down in the vfs but not yet > > > > > disposed of by XFS. > > > > > > > > Don't you also need to preserve inode->i_mapping->flags across the > > > > reinitialisation for the users which aren't yet using ->wb_err? > > > > > > At first I thought (mistakenly) that xfs had been completely converted, > > > but digging further we still use the old filemap_check_errors. It seems > > > reasonable to me that if we're going to resurrect an incore inode then > > > we should try to hang on to AS_EIO/AS_ENOSPC for as long as we can. > > > > > > > > > > Yes, I think the patch you sent makes sense. > > > > Most of the XFS callers are using filemap_write_and_wait{_range}, and > > most of those seem to be called from ioctls (->setattr op being the > > notable exception). > > > > We could (in principle) pass a pointer to file->f_wb_err to most of > > those callers, and use that to check for errors instead of looking for > > AS_EIO/AS_ENOSPC. We wouldn't want to advance the error cursor for those > > (as that should only be done in the context of an fsync), but it might > > be more reliable than using the flags here. > > Why wouldn't we advance the error pointer? The data writeback error > caused an operation to fail and the error has been reported to > userspace, so how is that different to fsync() failing and reporting > the error to userspace? > > This seems like a slippery slope of inconsistency to me, where data > writeback errors are only reported once on fsync(), but can be > reported multiple times through different filesystem operations... > If I get back an error on ioctl, why should I think that that has anything to do with writeback? For example, suppose I do: write = 0 ioctl = -EIO fsync = 0 The ioctl returns an error but the fsync returns 0. It would be reasonable to assume that my writes had made it to disk. I think that we really do want to ensure that fsync always returns an error if writeback failed since the last fsync. That was sort of the point of the errseq_t work in the first place. -- Jeff Layton <jlayton@xxxxxxxxxx>