On Wed, Mar 14, 2018 at 01:17:24PM -0400, Brian Foster wrote: > The struct xfs_agfl v5 header was originally introduced with > unexpected padding that caused the AGFL to operate with one less > slot than intended. The header has since been packed, but the fix > left an incompatibility for users who upgrade from an old kernel > with the unpacked header to a newer kernel with the packed header > while the AGFL happens to wrap around the end. The newer kernel > recognizes one extra slot at the physical end of the AGFL that the > previous kernel did not. The new kernel will eventually attempt to > allocate a block from that slot, which contains invalid data, and > cause a crash. > > This condition can be detected by comparing the active range of the > AGFL to the count. While this detects a padding mismatch, it can > also trigger false positives for unrelated flcount corruption. Since > we cannot distinguish a size mismatch due to padding from unrelated > corruption, we can't trust the AGFL enough to simply repopulate the > empty slot. > > Instead, avoid unnecessarily complex detection logic and and use a > solution that can handle any form of flcount corruption that slips > through read verifiers: distrust the entire AGFL and reset it to an > empty state. Any valid blocks within the AGFL are intentionally > leaked. This requires xfs_repair to rectify (which was already > necessary based on the state the AGFL was found in). The reset > mitigates the side effect of the padding mismatch problem from a > filesystem crash to a free space accounting inconsistency. The > generic approach also means that this patch can be safely backported > to kernels with or without a packed struct xfs_agfl. > > Check the AGF for an invalid freelist count on initial read from > disk. If detected, set a flag on the xfs_perag to indicate that a > reset is required before the AGFL can be used. In the first > transaction that attempts to use a flagged AGFL, reset it to empty, > warn the user about the inconsistency and allow the freelist fixup > code to repopulate the AGFL with new blocks. The xfs_perag flag is > cleared to eliminate the need for repeated checks on each block > allocation operation. > > Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx> > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > Hi all, > > Here's an actual patch for the AGFL reset approach. This survives a > normal xfstests run. I also ran some quick xfstests runs with hacks in > place to unconditionally trigger an agfl reset on first read. This > caused a ton of test failures due to the resulting accounting > inconsistency on the scratch device, but otherwise it survived from a > torture test perspective and didn't introduce any unrelated regressions > that I could see. > > Thoughts, reviews, flames appreciated. > > Brian > > v1: > - Use a simplified AGFL reset mechanism. > - Trigger on AGFL fixup rather than get/put. > - Various refactors, cleanups and comments. > rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2 > > fs/xfs/libxfs/xfs_alloc.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_mount.h | 1 + > fs/xfs/xfs_trace.h | 9 ++++- > 3 files changed, 105 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index 3db90b707fb2..7dfec92f28dc 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -2063,6 +2063,95 @@ xfs_alloc_space_available( > } > > /* > + * Check the agfl fields of the agf for inconsistency or corruption. The purpose > + * is to detect an agfl header padding mismatch between current and early v5 > + * kernels. This problem manifests as a 1-slot size difference between the > + * on-disk flcount and the active [first, last] range of a wrapped agfl. This > + * may also catch variants of agfl count corruption unrelated to padding. Either > + * way, we'll reset the agfl and warn the user. > + * > + * Return true if a reset is required before the agfl can be used, false > + * otherwise. > + */ > +static bool > +xfs_agfl_need_reset( > + struct xfs_mount *mp, > + struct xfs_agf *agf) > +{ > + uint32_t f = be32_to_cpu(agf->agf_flfirst); > + uint32_t l = be32_to_cpu(agf->agf_fllast); > + uint32_t c = be32_to_cpu(agf->agf_flcount); > + int agfl_size = xfs_agfl_size(mp); > + int active; > + > + /* no agfl header on v4 supers */ > + if (!xfs_sb_version_hascrc(&mp->m_sb)) > + return false; > + > + /* > + * The agf read verifier catches severe corruption of these fields. > + * Repeat some sanity checks to cover a packed -> unpacked mismatch if > + * the verifier allows it. > + */ > + if (f >= agfl_size || l >= agfl_size) > + return true; > + if (c > agfl_size) > + return true; > + > + /* > + * Check consistency between the on-disk count and the active range. An > + * agfl padding mismatch manifests as an inconsistent flcount. > + */ > + if (c && l >= f) > + active = l - f + 1; > + else if (c) > + active = agfl_size - f + l + 1; > + else > + active = 0; > + if (active != c) > + return true; > + > + return false; This could become "return active != c;" But otherwise looks ok enough to try it out, Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> --D > +} > + > +/* > + * Reset the agfl to an empty state. Ignore/drop any existing blocks since the > + * agfl content cannot be trusted. Warn the user that a repair is required to > + * recover leaked blocks. > + * > + * The purpose of this mechanism is to handle filesystems affected by the agfl > + * header padding mismatch problem. A reset keeps the filesystem online with a > + * relatively minor free space accounting inconsistency rather than suffer the > + * inevitable crash from use of an invalid agfl block. > + */ > +static void > +xfs_agfl_reset( > + struct xfs_trans *tp, > + struct xfs_buf *agbp, > + struct xfs_perag *pag) > +{ > + struct xfs_mount *mp = tp->t_mountp; > + struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp); > + > + ASSERT(pag->pagf_agflreset); > + trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_); > + > + xfs_warn(mp, > + "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. " > + "Please unmount and run xfs_repair.", > + pag->pag_agno, pag->pagf_flcount); > + > + agf->agf_flfirst = 0; > + agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1); > + agf->agf_flcount = 0; > + xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST | > + XFS_AGF_FLCOUNT); > + > + pag->pagf_flcount = 0; > + pag->pagf_agflreset = false; > +} > + > +/* > * Decide whether to use this allocation group for this allocation. > * If so, fix up the btree freelist's size. > */ > @@ -2123,6 +2212,10 @@ xfs_alloc_fix_freelist( > } > } > > + /* reset a padding mismatched agfl before final free space check */ > + if (pag->pagf_agflreset) > + xfs_agfl_reset(tp, agbp, pag); > + > /* If there isn't enough total space or single-extent, reject it. */ > need = xfs_alloc_min_freelist(mp, pag); > if (!xfs_alloc_space_available(args, need, flags)) > @@ -2279,6 +2372,7 @@ xfs_alloc_get_freelist( > agf->agf_flfirst = 0; > > pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno)); > + ASSERT(!pag->pagf_agflreset); > be32_add_cpu(&agf->agf_flcount, -1); > xfs_trans_agflist_delta(tp, -1); > pag->pagf_flcount--; > @@ -2390,6 +2484,7 @@ xfs_alloc_put_freelist( > agf->agf_fllast = 0; > > pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno)); > + ASSERT(!pag->pagf_agflreset); > be32_add_cpu(&agf->agf_flcount, 1); > xfs_trans_agflist_delta(tp, 1); > pag->pagf_flcount++; > @@ -2597,6 +2692,7 @@ xfs_alloc_read_agf( > pag->pagb_count = 0; > pag->pagb_tree = RB_ROOT; > pag->pagf_init = 1; > + pag->pagf_agflreset = xfs_agfl_need_reset(mp, agf); > } > #ifdef DEBUG > else if (!XFS_FORCED_SHUTDOWN(mp)) { > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > index 1808f56decaa..10b90bbc5162 100644 > --- a/fs/xfs/xfs_mount.h > +++ b/fs/xfs/xfs_mount.h > @@ -353,6 +353,7 @@ typedef struct xfs_perag { > char pagi_inodeok; /* The agi is ok for inodes */ > uint8_t pagf_levels[XFS_BTNUM_AGF]; > /* # of levels in bno & cnt btree */ > + bool pagf_agflreset; /* agfl requires reset before use */ > uint32_t pagf_flcount; /* count of blocks in freelist */ > xfs_extlen_t pagf_freeblks; /* total free blocks */ > xfs_extlen_t pagf_longest; /* longest free space */ > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h > index 945de08af7ba..a982c0b623d0 100644 > --- a/fs/xfs/xfs_trace.h > +++ b/fs/xfs/xfs_trace.h > @@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim, > __entry->tlen) > ); > > -TRACE_EVENT(xfs_agf, > +DECLARE_EVENT_CLASS(xfs_agf_class, > TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, > unsigned long caller_ip), > TP_ARGS(mp, agf, flags, caller_ip), > @@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf, > __entry->longest, > (void *)__entry->caller_ip) > ); > +#define DEFINE_AGF_EVENT(name) \ > +DEFINE_EVENT(xfs_agf_class, name, \ > + TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \ > + unsigned long caller_ip), \ > + TP_ARGS(mp, agf, flags, caller_ip)) > +DEFINE_AGF_EVENT(xfs_agf); > +DEFINE_AGF_EVENT(xfs_agfl_reset); > > TRACE_EVENT(xfs_free_extent, > TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno, > -- > 2.13.6 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html