On Tue, Jun 05, 2018 at 09:56:03PM -0700, Darrick J. Wong wrote: > On Wed, Jun 06, 2018 at 02:06:24PM +1000, Dave Chinner wrote: > > On Tue, Jun 05, 2018 at 04:18:56PM -0700, Darrick J. Wong wrote: > > > On Mon, Jun 04, 2018 at 11:52:55AM +1000, Dave Chinner wrote: > > > > > +int > > > > > +xfs_repair_mod_fdblocks( > > > > > + struct xfs_scrub_context *sc, > > > > > + int64_t delta_fdblocks) > > > > > +{ > > > > > + int error; > > > > > + > > > > > + if (delta_fdblocks == 0) > > > > > + return 0; > > > > > + > > > > > + if (delta_fdblocks < 0) { > > > > > + error = xfs_trans_reserve_more(sc->tp, -delta_fdblocks, 0); > > > > > + if (error) > > > > > + return error; > > > > > + } > > > > > + > > > > > + xfs_trans_mod_sb(sc->tp, XFS_TRANS_SB_FDBLOCKS, delta_fdblocks); > > > > > > > > This seems a little hacky - it's working around a transaction > > > > reservation overflow warning, right? > > > > > > More than that -- we're trying to avoid the situation where the incore > > > free block counter goes negative. > > > > Which will only happen if we overflow the transaction reservation, > > yes? > > > > > Things go south pretty quickly when > > > that happens because transaction reservations succeed when there's not > > > enough free space to accomodate them. We'd rather error out to > > > userspace and have the admin unmount and xfs_repair than risk letting > > > the fs really blow up. > > > > Sure, but I really don't like retrospective modification of > > transaction reservations. The repair code is already supposed to > > have a reservation that is big enough to rebuild the AG trees, so > > why should we need to reserve more space while rebuilding the AG > > trees? > > > > > Note that this function has to be called before repair dirties anything > > > in the repair transaction so we're still at a place where we could back > > > out with no harm done. > > > > Still doesn't explain to me what the problem is that this code works > > around. And because I don't understand why it is necessary, this just > > seems like a hack.... > > It /is/ a hack while I figure out a sane strategy for checking the > summary counters that doesn't regularly shut down the filesystem. I've > thought that perhaps we should leave the global counters alone. If > something corrupts agf_flcount to 0x1000032 (when the real flcount is > 0x32) then we're going to subtract a huge quantity from the global > counter, which is totally stupid. But that sort of thing is easy to deal with via bounding: min(agf_flcount, XFS_AGFL_SIZE(mp)) I'd like to avoid hacks for the "near to ENOSPC" conditions for the moment. Repair being unreliable at ENOSPC, or even having repair shut down because of unexpected ENOSPC is fine for the initial commits. Document it as a problem that needs fixing and add it to the list of things hat need to be addressed before we can remove the EXPERIMENTAL tag. > I've been mulling over what to do here -- normally, repair always writes > out fresh AG headers and summary counters and under normal circumstance > we'll always keep the AG counts and the global counts in sync, right? Userspace repair rebuilds the freespace and inode trees in each ag, and the rebuild keeps it's own count of the free space, used and free inodes tracked in the new versions of the trees. Once all AGs have been rebuilt, it sums the counts gathered in memory from each AG, and then it calls sync_sb() to write those aggregated counters back into the global superblock. IOWs, userspace repair doesn't rely on the existing counters (on disk or in memory) at all, nor does it try to keep them up to date as it goes. it keeps it's own state and assumes nothing else is modifying the filesystem so it's always going to be correct. > The idea I have to solve all these problems is to add a superblock state > flag that we set whenever we think the global counters might be wrong. > We'd only do this if we had to fix the counters in an AGF, or if the > agfl padding fixer triggered, etc. That's pretty much what I suggested via an "unclean unmount" state flag. That way a new mount would always trigger a resync. > Then we add a new FSSUMMARY scrubber that only does anything if the > 'counters might be wrong' flag is set. When it does, we freeze the fs, > tally up all the counters and reservations, and fix the counters if we > can. Then unfreeze and exit. This way we're not locking the entire > filesystem every time scrub runs, but we have a way to fix the global > counters if we need to. Granted, figuring out the total amount of > incore reservation might not be quick. Right - that's basically the problem doing it at mount time avoids - having to freeze the filesystem for an unknown amount of time to fix it up. I agree with you that "unmount/mount" is not really an option for fixing this up, and that freeze/fix/thaw is a much preferrable option. However, this really is something that needs to be scheduled for a maintenance period, not be done on a live production filesystem where a freeze will violate performance SLAs.... IOWs, I think this "sync global counters" op needs to be a separate admin controlled repair operation, not something we do automatically as part of the normal scrub/repair process. i.e. when repair completes, it tells the admin that: **** IMPORTANT - Repair operations still pending **** There are pending repair operations that need a quiesced filesystem to perform. Quiescing the filesystem will block all access to the filesystem while the repair operation is being performed, so this should be performed only during a scheduled maintenance period. To perform the pending repair operations, please run: <repair prog> --quiesce --pending <mntpt> **** /IMPORTANT **** > The other repair functions no longer need xfs_repair_mod_fdblocks; if > they has to make a correction to any of the per-ag free counters all > they do is set the "wrong counters" flag. *nod* Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html