Re: [PATCH v7 00/22] fs: enhanced writeback error reporting with errseq_t (pile #1)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2017-06-16 at 15:34 -0400, Jeff Layton wrote:
> v7:
> ===
> This is the seventh posting of the patchset to revamp the way writeback
> errors are tracked and reported.
> 
> The main difference from the v6 posting is the removal of the
> FS_WB_ERRSEQ flag. That requires a few other incremental patches in the
> writeback code to ensure that both error tracking models are handled
> in a suitable way.
> 
> Also, a bit more cleanup of the metadata writeback codepaths, and some
> documentation updates.
> 
> Some of these patches have been posted separately, but I'm re-posting
> them here to make it clear that they're prerequisites of the later
> patches in the series.
> 
> This series is based on top of linux-next from a day or so ago. I'd like
> to have this picked up by linux-next in the near future so we can get
> some more extensive testing with it. Should I just plan to maintain a
> topic branch on top of -next and ask Stephen to pick it up?
> 
> Background:
> ===========
> The basic problem is that we have (for a very long time) tracked and
> reported writeback errors based on two flags in the address_space:
> AS_EIO and AS_ENOSPC. Those flags are cleared when they are checked,
> so only the first caller to check them is able to consume them.
> 
> That model is quite unreliable, for several related reasons:
> 
> * only the first fsync caller on the inode will see the error. In a
>   world of containerized setups, that's no longer viable. Applications
>   need to know that their writes are safely stored, and they can
>   currently miss seeing errors that they should be aware of when
>   they're not.
> 
> * there are a lot of internal callers to filemap_fdatawait* and
>   filemap_write_and_wait* that clear these errors but then never report
>   them to userland in any fashion.
> 
> * Some internal callers report writeback errors, but can do so at
>   non-sensical times. For instance, we might want to truncate a file,
>   which triggers a pagecache flush. If that writeback fails, we might
>   report that error to the truncate caller, but a subsequent fsync
>   will likely not see it.
> 
> * Some internal callers try to reset the error flags after clearing
>   them, but that's racy. Another task could check the flags between
>   those two events.
> 
> Solution:
> =========
> This patchset adds a new datatype called an errseq_t that represents a
> sequence of errors. It's a u32, with a field for a POSIX-flavor error
> and a counter, managed with atomics. We can sample that value at a
> particular point in time, and can later tell whether there have been any
> errors since that point.
> 
> That allows us to provide traditional check-and-clear fsync semantics
> on every open file description in a lightweight fashion. fsync callers
> no longer need to coordinate between one another in order to ensure
> that errors at fsync time are handled correctly.
> 
> Strategy:
> =========
> The aim with this pile is to do the minimum possible to support for
> reliable reporting of errors on fsync, without substantially changing
> the internals of the filesystems themselves.
> 
> Most of the internal calls to filemap_fdatawait are left alone, so all
> of the internal error checkers are using the same error handling they
> always have. The only real difference here is that we're better
> reporting errors at fsync.
> 
> I think that we probably will want to eventually convert all of those
> internal callers to use errseq_t based reporting, but that can be done
> in an incremental fashion in follow-on patchsets.
> 
> Testing:
> ========
> I've primarily been testing this with some new xfstests that I will post
> in a separate series. These tests use dm-error fault injection to make
> the underlying block device start throwing I/O errors, and then test the
> how the filesystem layer reports errors after that.
> 
> Jeff Layton (22):
>   fs: remove call_fsync helper function
>   buffer: use mapping_set_error instead of setting the flag
>   fs: check for writeback errors after syncing out buffers in
>     generic_file_fsync
>   buffer: set errors in mapping at the time that the error occurs
>   jbd2: don't clear and reset errors after waiting on writeback
>   mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
>   mm: don't TestClearPageError in __filemap_fdatawait_range
>   mm: clean up error handling in write_one_page
>   fs: always sync metadata in __generic_file_fsync
>   lib: add errseq_t type and infrastructure for handling it
>   fs: new infrastructure for writeback error handling and reporting
>   mm: tracepoints for writeback error events
>   mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
>   Documentation: flesh out the section in vfs.txt on storing and
>     reporting writeback errors
>   dax: set errors in mapping when writeback fails
>   block: convert to errseq_t based writeback error tracking
>   ext4: use errseq_t based error handling for reporting data writeback
>     errors
>   fs: add f_md_wb_err field to struct file for tracking metadata errors
>   ext4: add more robust reporting of metadata writeback errors
>   ext2: convert to errseq_t based writeback error tracking
>   xfs: minimal conversion to errseq_t writeback error reporting
>   btrfs: minimal conversion to errseq_t writeback error reporting on
>     fsync
> 
>  Documentation/filesystems/vfs.txt |  43 +++++++-
>  drivers/dax/device.c              |   1 +
>  fs/block_dev.c                    |   9 +-
>  fs/btrfs/file.c                   |   7 +-
>  fs/buffer.c                       |  20 ++--
>  fs/dax.c                          |   4 +-
>  fs/ext2/dir.c                     |   8 ++
>  fs/ext2/file.c                    |  26 ++++-
>  fs/ext4/dir.c                     |   8 +-
>  fs/ext4/file.c                    |   5 +-
>  fs/ext4/fsync.c                   |  28 ++++-
>  fs/file_table.c                   |   1 +
>  fs/gfs2/lops.c                    |   2 +-
>  fs/jbd2/commit.c                  |  15 +--
>  fs/libfs.c                        |  12 +--
>  fs/open.c                         |   3 +
>  fs/sync.c                         |   2 +-
>  fs/xfs/xfs_file.c                 |  15 ++-
>  include/linux/buffer_head.h       |   1 +
>  include/linux/errseq.h            |  19 ++++
>  include/linux/fs.h                |  67 ++++++++++--
>  include/linux/pagemap.h           |  31 ++++--
>  include/trace/events/filemap.h    |  52 ++++++++++
>  ipc/shm.c                         |   2 +-
>  lib/Makefile                      |   2 +-
>  lib/errseq.c                      | 208 ++++++++++++++++++++++++++++++++++++++
>  mm/filemap.c                      | 113 +++++++++++++++++----
>  mm/page-writeback.c               |  15 ++-
>  28 files changed, 628 insertions(+), 91 deletions(-)
>  create mode 100644 include/linux/errseq.h
>  create mode 100644 lib/errseq.c
> 

If there are no major objections to this set, I'd like to have
linux-next start picking it up to get some wider testing. What's the
right vehicle for this, given that it touches stuff all over the tree?

I can see 3 potential options:

1) I could just pull these into the branch that Stephen is already
picking up for file-locks in my tree

2) I could put them into a new branch, and have Stephen pull that one in
addition to the file-locks branch

3) It could go in via someone else's tree entirely (Andrew or Al's
maybe?)

I'm fine with any of these. Anyone have thoughts?

Thanks,
-- 
Jeff Layton <jlayton@xxxxxxxxxx>



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux