On Mon, Mar 12, 2018 at 01:35:53PM -0400, Brian Foster wrote: > On Mon, Mar 12, 2018 at 09:11:58AM -0400, Brian Foster wrote: > > On Fri, Mar 09, 2018 at 02:37:57PM -0800, Darrick J. Wong wrote: > > > On Fri, Mar 09, 2018 at 04:20:31PM -0500, Brian Foster wrote: > > > > On Fri, Mar 09, 2018 at 11:08:32AM -0800, Darrick J. Wong wrote: > > > > > On Fri, Mar 09, 2018 at 01:37:28PM -0500, Brian Foster wrote: > > > > > > On Fri, Mar 09, 2018 at 09:33:18AM -0800, Darrick J. Wong wrote: > ... > > > <nod> I've been mulling over rewriting your previous rfc patch that put > > > all the scanning/lifting in {get,put}_freelist but having it reset the > > > agfl instead of doing the swizzling stuff. > > > > > > > Something to be careful of... emptying the agfl obviously means the > > subsequent fixup is a btree block allocation. That may limit the context > > of where the fixup can be performed. IOW, deferring it to > > _get_freelist() might no longer be safe. Instead, I think we'd have to > > implement it such that the on-disk flcount is completely untrusted when > > the mismatch is detected. > > > ... > > Here's a variant of that patch that does a reset. It's definitely much > simpler. Thoughts? I like it - it is so much simpler than the other proposals, and it's done on demand rather than by a brute-force mount/unmount scan. > Brian > > --- 8< --- > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index c02781a4c091..7d313bb4677d 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -2053,6 +2053,59 @@ xfs_alloc_space_available( > return true; > } > > +static bool > +xfs_agf_verify_flcount( > + struct xfs_mount *mp, > + struct xfs_agf *agf) Needs a comment explaining what it's checking and the return value. Might actually read better as "xfs_agf_need_flcount_reset()" returning true if fixup is required, especially at the caller site... > +{ > + int f = be32_to_cpu(agf->agf_flfirst); > + int l = be32_to_cpu(agf->agf_fllast); > + int c = be32_to_cpu(agf->agf_flcount); These should probably be uint32_t - they are unsigned on disk. > + int active = c; > + int agfl_size = XFS_AGFL_SIZE(mp); > + > + if (!xfs_sb_version_hascrc(&mp->m_sb)) > + return true; > + > + if (c && l >= f) > + active = l - f + 1; > + else if (c) > + active = agfl_size - f + l + 1; else active = 0; To move all the initialisation of active into the one logic statement and not make it dependent on the original value in the AGF? > + if (active != c) > + return false; > + if (f >= agfl_size || l >= agfl_size) > + return false; Shouldn't we be range checking these first? Also should probably checking c the same way. > + > + return true; > +} > + > +static void > +xfs_agfl_reset( > + struct xfs_trans *tp, > + struct xfs_buf *agbp, > + struct xfs_perag *pag) > +{ > + struct xfs_mount *mp = tp->t_mountp; > + struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp); > + > + if (!pag->pagf_needreset) > + return; > + > + trace_xfs_agfl_reset(pag); > + xfs_warn(mp, "agfl reset agno %u flcount %d", pag->pag_agno, > + pag->pagf_flcount); > + > + agf->agf_flfirst = 0; > + agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1); > + agf->agf_flcount = 0; > + xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST | > + XFS_AGF_FLCOUNT); > + > + pag->pagf_flcount = 0; > + pag->pagf_needreset = false; > +} > + > /* > * Decide whether to use this allocation group for this allocation. > * If so, fix up the btree freelist's size. > @@ -2119,6 +2172,9 @@ xfs_alloc_fix_freelist( > if (!xfs_alloc_space_available(args, need, flags)) > goto out_agbp_relse; > > + if (pag->pagf_needreset) > + xfs_agfl_reset(tp, agbp, pag); > + > /* > * Make the freelist shorter if it's too long. > * > @@ -2588,6 +2644,7 @@ xfs_alloc_read_agf( > pag->pagb_count = 0; > pag->pagb_tree = RB_ROOT; > pag->pagf_init = 1; > + pag->pagf_needreset = !xfs_agf_verify_flcount(mp, agf); > } > #ifdef DEBUG > else if (!XFS_FORCED_SHUTDOWN(mp)) { > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > index e0792d036be2..5dd36920b7d6 100644 > --- a/fs/xfs/xfs_mount.h > +++ b/fs/xfs/xfs_mount.h > @@ -353,6 +353,7 @@ typedef struct xfs_perag { > char pagi_inodeok; /* The agi is ok for inodes */ > uint8_t pagf_levels[XFS_BTNUM_AGF]; > /* # of levels in bno & cnt btree */ > + bool pagf_needreset; > uint32_t pagf_flcount; /* count of blocks in freelist */ > xfs_extlen_t pagf_freeblks; /* total free blocks */ > xfs_extlen_t pagf_longest; /* longest free space */ > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h > index 945de08af7ba..678d602dc400 100644 > --- a/fs/xfs/xfs_trace.h > +++ b/fs/xfs/xfs_trace.h > @@ -3339,6 +3339,24 @@ TRACE_EVENT(xfs_trans_resv_calc, > __entry->logflags) > ); > > +TRACE_EVENT(xfs_agfl_reset, > + TP_PROTO(struct xfs_perag *pag), > + TP_ARGS(pag), > + TP_STRUCT__entry( > + __field(dev_t, dev) > + __field(xfs_agnumber_t, agno) > + __field(int, flcount) > + ), > + TP_fast_assign( > + __entry->dev = pag->pag_mount->m_super->s_dev; > + __entry->agno = pag->pag_agno; > + __entry->flcount = pag->pagf_flcount; > + ), > + TP_printk("dev %d:%d agno %u flcount %d", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __entry->agno, __entry->flcount) > +); Do we need a new tracepoint? Wouldn't it be better to just call trace_xfs_agf() which will dump all the information in the AGF just before we do the reset? We'll know it's a reset from the caller information that is dumped by that tracepoint.... > + > #endif /* _TRACE_XFS_H */ > > #undef TRACE_INCLUDE_PATH > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html