On Fri, Mar 16, 2018 at 08:00:02AM -0400, Brian Foster wrote: > On Thu, Mar 15, 2018 at 12:31:30PM -0700, Darrick J. Wong wrote: > > On Thu, Mar 15, 2018 at 01:45:39PM -0400, Brian Foster wrote: > > > The struct xfs_agfl v5 header was originally introduced with > > > unexpected padding that caused the AGFL to operate with one less > > > slot than intended. The header has since been packed, but the fix > > > left an incompatibility for users who upgrade from an old kernel > > > with the unpacked header to a newer kernel with the packed header > > > while the AGFL happens to wrap around the end. The newer kernel > > > recognizes one extra slot at the physical end of the AGFL that the > > > previous kernel did not. The new kernel will eventually attempt to > > > allocate a block from that slot, which contains invalid data, and > > > cause a crash. > > > > > > This condition can be detected by comparing the active range of the > > > AGFL to the count. While this detects a padding mismatch, it can > > > also trigger false positives for unrelated flcount corruption. Since > > > we cannot distinguish a size mismatch due to padding from unrelated > > > corruption, we can't trust the AGFL enough to simply repopulate the > > > empty slot. > > > > > > Instead, avoid unnecessarily complex detection logic and and use a > > > solution that can handle any form of flcount corruption that slips > > > through read verifiers: distrust the entire AGFL and reset it to an > > > empty state. Any valid blocks within the AGFL are intentionally > > > leaked. This requires xfs_repair to rectify (which was already > > > necessary based on the state the AGFL was found in). The reset > > > mitigates the side effect of the padding mismatch problem from a > > > filesystem crash to a free space accounting inconsistency. The > > > generic approach also means that this patch can be safely backported > > > to kernels with or without a packed struct xfs_agfl. > > > > > > Check the AGF for an invalid freelist count on initial read from > > > disk. If detected, set a flag on the xfs_perag to indicate that a > > > reset is required before the AGFL can be used. In the first > > > transaction that attempts to use a flagged AGFL, reset it to empty, > > > warn the user about the inconsistency and allow the freelist fixup > > > code to repopulate the AGFL with new blocks. The xfs_perag flag is > > > cleared to eliminate the need for repeated checks on each block > > > allocation operation. > > > > > > Fixes: 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct") > > > CC: <stable@xxxxxxxxxxxxxxx> > > > Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx> > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > > Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > Reviewed-by Dave Chiluk <chiluk+linuxxfs@xxxxxxxxxx> > > > > Thanks, will test and apply to 4.17. > > > > Thanks. Re: Dave's comment on the 'Fixes:' tag [1]... > > I've locally dropped the tag and appended the following sentence to the > last paragraph in the commit log: > > "This allows kernels that include the packing fix commit 96f859d52bcb > ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct") > to handle older unpacked AGFL formats without a filesystem crash." > > I think that demonstrates intent without making the (already long) > commit log too much longer. Thoughts? I can send that as v3 if it's > easier, but I'd prefer to get this nailed down first if there are no > further comments on the patch. That seems fine to me. If there aren't any other changes to this patch I'll simply edit the commit message in my tree to drop the Fixes: tag and add the blurb above. Once this has stewed in upstream for a few weeks we can start to backport this to older kernels. I volunteer for 4.1 and 4.14, any takers for the others? I will also send out my xfstest for review. --D > Brian > > [1] https://marc.info/?l=linux-xfs&m=152115408908495&w=2 > > > --D > > > > > --- > > > > > > v2: > > > - Function rename and logic cleanup. > > > - CC stable. > > > v1: https://marc.info/?l=linux-xfs&m=152104784709048&w=2 > > > - Use a simplified AGFL reset mechanism. > > > - Trigger on AGFL fixup rather than get/put. > > > - Various refactors, cleanups and comments. > > > rfc: https://marc.info/?l=linux-xfs&m=152045069616434&w=2 > > > > > > fs/xfs/libxfs/xfs_alloc.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++ > > > fs/xfs/xfs_mount.h | 1 + > > > fs/xfs/xfs_trace.h | 9 ++++- > > > 3 files changed, 103 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > > > index 3db90b707fb2..39387bdd225d 100644 > > > --- a/fs/xfs/libxfs/xfs_alloc.c > > > +++ b/fs/xfs/libxfs/xfs_alloc.c > > > @@ -2063,6 +2063,93 @@ xfs_alloc_space_available( > > > } > > > > > > /* > > > + * Check the agfl fields of the agf for inconsistency or corruption. The purpose > > > + * is to detect an agfl header padding mismatch between current and early v5 > > > + * kernels. This problem manifests as a 1-slot size difference between the > > > + * on-disk flcount and the active [first, last] range of a wrapped agfl. This > > > + * may also catch variants of agfl count corruption unrelated to padding. Either > > > + * way, we'll reset the agfl and warn the user. > > > + * > > > + * Return true if a reset is required before the agfl can be used, false > > > + * otherwise. > > > + */ > > > +static bool > > > +xfs_agfl_needs_reset( > > > + struct xfs_mount *mp, > > > + struct xfs_agf *agf) > > > +{ > > > + uint32_t f = be32_to_cpu(agf->agf_flfirst); > > > + uint32_t l = be32_to_cpu(agf->agf_fllast); > > > + uint32_t c = be32_to_cpu(agf->agf_flcount); > > > + int agfl_size = xfs_agfl_size(mp); > > > + int active; > > > + > > > + /* no agfl header on v4 supers */ > > > + if (!xfs_sb_version_hascrc(&mp->m_sb)) > > > + return false; > > > + > > > + /* > > > + * The agf read verifier catches severe corruption of these fields. > > > + * Repeat some sanity checks to cover a packed -> unpacked mismatch if > > > + * the verifier allows it. > > > + */ > > > + if (f >= agfl_size || l >= agfl_size) > > > + return true; > > > + if (c > agfl_size) > > > + return true; > > > + > > > + /* > > > + * Check consistency between the on-disk count and the active range. An > > > + * agfl padding mismatch manifests as an inconsistent flcount. > > > + */ > > > + if (c && l >= f) > > > + active = l - f + 1; > > > + else if (c) > > > + active = agfl_size - f + l + 1; > > > + else > > > + active = 0; > > > + > > > + return active != c; > > > +} > > > + > > > +/* > > > + * Reset the agfl to an empty state. Ignore/drop any existing blocks since the > > > + * agfl content cannot be trusted. Warn the user that a repair is required to > > > + * recover leaked blocks. > > > + * > > > + * The purpose of this mechanism is to handle filesystems affected by the agfl > > > + * header padding mismatch problem. A reset keeps the filesystem online with a > > > + * relatively minor free space accounting inconsistency rather than suffer the > > > + * inevitable crash from use of an invalid agfl block. > > > + */ > > > +static void > > > +xfs_agfl_reset( > > > + struct xfs_trans *tp, > > > + struct xfs_buf *agbp, > > > + struct xfs_perag *pag) > > > +{ > > > + struct xfs_mount *mp = tp->t_mountp; > > > + struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp); > > > + > > > + ASSERT(pag->pagf_agflreset); > > > + trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_); > > > + > > > + xfs_warn(mp, > > > + "WARNING: Reset corrupted AGFL on AG %u. %d blocks leaked. " > > > + "Please unmount and run xfs_repair.", > > > + pag->pag_agno, pag->pagf_flcount); > > > + > > > + agf->agf_flfirst = 0; > > > + agf->agf_fllast = cpu_to_be32(xfs_agfl_size(mp) - 1); > > > + agf->agf_flcount = 0; > > > + xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST | > > > + XFS_AGF_FLCOUNT); > > > + > > > + pag->pagf_flcount = 0; > > > + pag->pagf_agflreset = false; > > > +} > > > + > > > +/* > > > * Decide whether to use this allocation group for this allocation. > > > * If so, fix up the btree freelist's size. > > > */ > > > @@ -2123,6 +2210,10 @@ xfs_alloc_fix_freelist( > > > } > > > } > > > > > > + /* reset a padding mismatched agfl before final free space check */ > > > + if (pag->pagf_agflreset) > > > + xfs_agfl_reset(tp, agbp, pag); > > > + > > > /* If there isn't enough total space or single-extent, reject it. */ > > > need = xfs_alloc_min_freelist(mp, pag); > > > if (!xfs_alloc_space_available(args, need, flags)) > > > @@ -2279,6 +2370,7 @@ xfs_alloc_get_freelist( > > > agf->agf_flfirst = 0; > > > > > > pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno)); > > > + ASSERT(!pag->pagf_agflreset); > > > be32_add_cpu(&agf->agf_flcount, -1); > > > xfs_trans_agflist_delta(tp, -1); > > > pag->pagf_flcount--; > > > @@ -2390,6 +2482,7 @@ xfs_alloc_put_freelist( > > > agf->agf_fllast = 0; > > > > > > pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno)); > > > + ASSERT(!pag->pagf_agflreset); > > > be32_add_cpu(&agf->agf_flcount, 1); > > > xfs_trans_agflist_delta(tp, 1); > > > pag->pagf_flcount++; > > > @@ -2597,6 +2690,7 @@ xfs_alloc_read_agf( > > > pag->pagb_count = 0; > > > pag->pagb_tree = RB_ROOT; > > > pag->pagf_init = 1; > > > + pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf); > > > } > > > #ifdef DEBUG > > > else if (!XFS_FORCED_SHUTDOWN(mp)) { > > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > > > index 1808f56decaa..10b90bbc5162 100644 > > > --- a/fs/xfs/xfs_mount.h > > > +++ b/fs/xfs/xfs_mount.h > > > @@ -353,6 +353,7 @@ typedef struct xfs_perag { > > > char pagi_inodeok; /* The agi is ok for inodes */ > > > uint8_t pagf_levels[XFS_BTNUM_AGF]; > > > /* # of levels in bno & cnt btree */ > > > + bool pagf_agflreset; /* agfl requires reset before use */ > > > uint32_t pagf_flcount; /* count of blocks in freelist */ > > > xfs_extlen_t pagf_freeblks; /* total free blocks */ > > > xfs_extlen_t pagf_longest; /* longest free space */ > > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h > > > index 945de08af7ba..a982c0b623d0 100644 > > > --- a/fs/xfs/xfs_trace.h > > > +++ b/fs/xfs/xfs_trace.h > > > @@ -1477,7 +1477,7 @@ TRACE_EVENT(xfs_extent_busy_trim, > > > __entry->tlen) > > > ); > > > > > > -TRACE_EVENT(xfs_agf, > > > +DECLARE_EVENT_CLASS(xfs_agf_class, > > > TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, > > > unsigned long caller_ip), > > > TP_ARGS(mp, agf, flags, caller_ip), > > > @@ -1533,6 +1533,13 @@ TRACE_EVENT(xfs_agf, > > > __entry->longest, > > > (void *)__entry->caller_ip) > > > ); > > > +#define DEFINE_AGF_EVENT(name) \ > > > +DEFINE_EVENT(xfs_agf_class, name, \ > > > + TP_PROTO(struct xfs_mount *mp, struct xfs_agf *agf, int flags, \ > > > + unsigned long caller_ip), \ > > > + TP_ARGS(mp, agf, flags, caller_ip)) > > > +DEFINE_AGF_EVENT(xfs_agf); > > > +DEFINE_AGF_EVENT(xfs_agfl_reset); > > > > > > TRACE_EVENT(xfs_free_extent, > > > TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno, > > > -- > > > 2.13.6 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html