Re: [PATCH v2] xfs: detect agfl count corruption and reset agfl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 23, 2018 at 07:46:22AM -0400, Brian Foster wrote:
> On Fri, Mar 23, 2018 at 01:11:38PM +0800, Eryu Guan wrote:
> > Hi Brian,
> > 
> > On Thu, Mar 15, 2018 at 01:45:39PM -0400, Brian Foster wrote:
> > > The struct xfs_agfl v5 header was originally introduced with
> > > unexpected padding that caused the AGFL to operate with one less
> > > slot than intended. The header has since been packed, but the fix
> > > left an incompatibility for users who upgrade from an old kernel
> > > with the unpacked header to a newer kernel with the packed header
> > > while the AGFL happens to wrap around the end. The newer kernel
> > > recognizes one extra slot at the physical end of the AGFL that the
> > > previous kernel did not. The new kernel will eventually attempt to
> > > allocate a block from that slot, which contains invalid data, and
> > > cause a crash.
> > > 
> > > This condition can be detected by comparing the active range of the
> > > AGFL to the count. While this detects a padding mismatch, it can
> > > also trigger false positives for unrelated flcount corruption. Since
> > > we cannot distinguish a size mismatch due to padding from unrelated
> > > corruption, we can't trust the AGFL enough to simply repopulate the
> > > empty slot.
> > > 
> > > Instead, avoid unnecessarily complex detection logic and and use a
> > > solution that can handle any form of flcount corruption that slips
> > > through read verifiers: distrust the entire AGFL and reset it to an
> > > empty state. Any valid blocks within the AGFL are intentionally
> > > leaked. This requires xfs_repair to rectify (which was already
> > > necessary based on the state the AGFL was found in). The reset
> > > mitigates the side effect of the padding mismatch problem from a
> > > filesystem crash to a free space accounting inconsistency. The
> > > generic approach also means that this patch can be safely backported
> > > to kernels with or without a packed struct xfs_agfl.
> > > 
> > > Check the AGF for an invalid freelist count on initial read from
> > > disk. If detected, set a flag on the xfs_perag to indicate that a
> > > reset is required before the AGFL can be used. In the first
> > > transaction that attempts to use a flagged AGFL, reset it to empty,
> > > warn the user about the inconsistency and allow the freelist fixup
> > > code to repopulate the AGFL with new blocks. The xfs_perag flag is
> > > cleared to eliminate the need for repeated checks on each block
> > > allocation operation.
> > > 
> > > Fixes: 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
> > > CC: <stable@xxxxxxxxxxxxxxx>
> > > Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx>
> > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > Reviewed-by Dave Chiluk <chiluk+linuxxfs@xxxxxxxxxx>
> > > ---
> > > 
> > > v2:
> > > - Function rename and logic cleanup.
> > > - CC stable.
> > > v1: https://marc.info/?l=linux-xfs&m=152104784709048&w=2
> > > - Use a simplified AGFL reset mechanism.
> > > - Trigger on AGFL fixup rather than get/put.
> > > - Various refactors, cleanups and comments.
> > > rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2
> > > 
> > >  fs/xfs/libxfs/xfs_alloc.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_mount.h        |  1 +
> > >  fs/xfs/xfs_trace.h        |  9 ++++-
> > >  3 files changed, 103 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > index 3db90b707fb2..39387bdd225d 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > > @@ -2063,6 +2063,93 @@ xfs_alloc_space_available(
> > >  }
> > >  
> > >  /*
> > > + * Check the agfl fields of the agf for inconsistency or corruption. The purpose
> > > + * is to detect an agfl header padding mismatch between current and early v5
> > > + * kernels. This problem manifests as a 1-slot size difference between the
> > > + * on-disk flcount and the active [first, last] range of a wrapped agfl. This
> > > + * may also catch variants of agfl count corruption unrelated to padding. Either
> > > + * way, we'll reset the agfl and warn the user.
> > > + *
> > > + * Return true if a reset is required before the agfl can be used, false
> > > + * otherwise.
> > > + */
> > > +static bool
> > > +xfs_agfl_needs_reset(
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_agf		*agf)
> > > +{
> > > +	uint32_t		f = be32_to_cpu(agf->agf_flfirst);
> > > +	uint32_t		l = be32_to_cpu(agf->agf_fllast);
> > > +	uint32_t		c = be32_to_cpu(agf->agf_flcount);
> > > +	int			agfl_size = xfs_agfl_size(mp);
> >                                             ^^^^^^^^^^^^^
> > Should this be XFS_AGFL_SIZE(mp)? Otherwise I got compile error (based
> > on 4.15-rc5 kernel):
> > 
> 
> This patch was rebased onto for-next which now includes commit
> a78ee256c325e ("xfs: convert XFS_AGFL_SIZE to a helper function"). That
> commit refactors the macro into a helper function.

Ah, thanks! I did search the list (roughly) to see if there's any patch
introduced xfs_agfl_size but apparently missed this patch.

Thanks,
Eryu



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]