On Wed, 2017-05-31 at 14:37 -0700, Andrew Morton wrote: > On Wed, 31 May 2017 17:31:49 -0400 Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > On Wed, 2017-05-31 at 13:27 -0700, Andrew Morton wrote: > > > On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > This is v5 of the patchset to improve how we're tracking and reporting > > > > errors that occur during pagecache writeback. > > > > > > I'm curious to know how you've been testing this? > > > Is that testing > > > strong enough for us to be confident that all nature of I/O errors > > > will be reported to userspace? > > > > > > > That's a tall order. This is a difficult thing to test as these sorts of > > errors are pretty rare by nature. > > > > I have an xfstest that I posted just after this set that demonstrates > > that it works correctly, at least on ext2/3/4 when run by the ext4 > > driver (ext2 legacy driver reports too many errors currently). I had > > btrfs and xfs working on that test too in an earlier incarnation of this > > set, so I think we can fix this in them as well without too much > > difficulty. > > > > I'm happy to run other tests if someone wants to suggest them. > > > > Now, all that said, I don't think this will make things any worse than > > they are today as far as reporting errors properly to userland goes. > > It's rather easy for an incidental synchronous writeback request from an > > internal caller to clear the AS_* flags today. This will at least ensure > > that we're reporting errors since a well-defined point in time when you > > call fsync. > > Were you using error injection of some form? If so, how was that all > set up? > Yes, it uses dm-error for fault injection. The test basically does: 1) set up a dm-error device in a working configuration 2) build a scratch filesystem on it, with the log on a different device in some fashion so metadata writeback will still succeed. 3) open the same file several times 4) flip dm-error device to non-working mode 5) write to each fd 6) fsync each fd ...do you get back an error on each fsync? It then does a bit more to make sure they're cleared afterward as you'd expect. That works for most block device based filesystems. I also have a second xfstest that opens a block device and does the same basic thing. That also works correctly with this patch series. I still need to come up with a way to simulate errors on other fs' though. We may need to plumb in some kernel-level fault injection on some fs' to do that correctly. Suggestions welcome there. With this series though, the idea is to convert one filesystem at a time, so I think that should help mitigate some of the risk. -- Jeff Layton <jlayton@xxxxxxxxxx>