Re: [PATCH 02/19] btrfs: handle checksum validation and repair at the storage layer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 12, 2023 at 10:05:14AM +0100, Christoph Hellwig wrote:
> Currently btrfs handles checksum validation and repair in the end I/O
> handler for the btrfs_bio.  This leads to a lot of duplicate code
> plus issues with variying semantics or bugs, e.g.
> 
>  - the until recently broken repair for compressed extents
>  - the fact that encoded reads validate the checksums but do not kick
>    of read repair
>  - the inconsistent checking of the BTRFS_FS_STATE_NO_CSUMS flag
> 
> This commit revamps the checksum validation and repair code to instead
> work below the btrfs_submit_bio interfaces.  For this to work we need
> to make sure an inode is available, so that is added as a parameter
> to btrfs_bio_alloc.  With that btrfs_submit_bio can preload
> btrfs_bio.csum from the csum tree without help from the upper layers,
> and the low-level I/O completion can iterate over the bio and verify
> the checksums.
> 
> In case of a checksum failure (or a plain old I/O error), the repair
> is now kicked off before the upper level ->end_io handler is invoked.
> Tracking of the repair status is massively simplified by just keeping
> a small failed_bio structure per bio with failed sectors and otherwise
> using the information in the repair bio.  The per-inode I/O failure
> tree can be entirely removed.
> 
> The saved bvec_iter in the btrfs_bio is now competely managed by
> btrfs_submit_bio and must not be accessed by the callers.
> 
> There is one significant behavior change here:  If repair fails or
> is impossible to start with, the whole bio will be failed to the
> upper layer.  This is the behavior that all I/O submitters execept
> for buffered I/O already emulated in their end_io handler.  For
> buffered I/O this now means that a large readahead request can
> fail due to a single bad sector, but as readahead errors are igored
> the following readpage if the sector is actually accessed will
> still be able to read.  This also matches the I/O failure handling
> in other file systems.

I've tried several times to convince myself that this patch could be
merged as-is despite it's going against principles and standards I apply
to other patches.

The patch size itself is an instant red flag for a change that tries to
turn upside down some fundamental functionality like checksumming time.

The changelog sounds like a good cover letter for a series, overall
description but lacks more details.

The patch got 2 reviews, but this is a developer review, the code does
what's advertised and the reviewers were hopefully able to understand
it. While I do such review too, I also do a pass that applies criteria
like long term maintainability, potential risk of introducing hidden
bugs and the cost of resolving them later at a much higher cost.

>From your reply to v2 you do have some idea what can be split
(direct/buffered/ compressed IO) but don't see the value doing it.

How to split further:

- preparatory changes that only extend some interface (function,
  structure) and can take NULL or unused values before the actual switch
  happens, the logic of that is described separately

- code that does not need to be deleted in that same patch can be moved
  to a separate one, like with some other patches in the series

- the core change builds on the previous patches and describes how
  it's used, new logic, etc

- if the new repair io logic can live in parallel with the old
  io_failure_tree then remove the old logic separately, temporarily
  leaving unused structures eg. io_failure_tree

- use of mempool must be mentioned in the changelog with explanation
  that it's the safe usage pattern and why it cannot lead to lockups

I would feel bad about myself if I really have to close both my eyes,
turn around and do 'git apply' just to move forward with this patchset,
because your and Johannes' work depends on that. This would only
backfire.

So _please_ try to split the patch where the core logic that cannot be
split further is in a patch that is minimal in its size and does not
carry distracting changes.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux