On Wed, Oct 21, 2020 at 09:21:08PM +0800, Gao Xiang wrote: > On Wed, Oct 21, 2020 at 05:55:19AM -0400, Brian Foster wrote: ... > > > > > > > > > > > > Interesting... this seems fundamentally sane when narrowing the scope > > > > > > down to tail AG shrinking. Does xfs_repair flag any issues in the simple > > > > > > tail AG shrink case? > > > > > > > > > > Yeah, I ran xfs_repair together as well, For smaller sizes, it seems > > > > > all fine, but I did observe some failure when much larger values > > > > > passed in, so as a formal patch, it really needs to be solved later. > > > > > > > > > > > > > I'm curious to see what xfs_repair complained about if you have a record > > > > of it. That might call out some other things we could be overlooking. > > > > > > Sorry for somewhat slow progress... > > > > > > it could show random "SB summary counter sanity check failed" runtime message > > > when the shrink size is large (much close to ag start). > > > > > > > Ok. That error looks associated with a few different checks: > > > > if (XFS_BUF_ADDR(bp) == XFS_SB_DADDR && !sbp->sb_inprogress && > > (sbp->sb_fdblocks > sbp->sb_dblocks || > > !xfs_verify_icount(mp, sbp->sb_icount) || > > sbp->sb_ifree > sbp->sb_icount)) { > > xfs_warn(mp, "SB summary counter sanity check failed"); > > return -EFSCORRUPTED; > > } > > > > Though I think the inode counters should be a subset of allocated space > > (i.e. inode chunks) so are unlikely to be impacted by a removal of free > > space. Without looking into details, I'd guess it's most likely just an > > accounting bug and it's easiest to dump the relevant values that land in > > the superblock and work backwards from there. FWIW, the followon > > shutdown, repair (dirty log) and log recovery behavior (write and read > > verifier failures) are typical and to be expected on metadata > > corruption. IOW, I suspect that if we address the write verifier > > failure, the followon issues will likely be resolved as well. > > After looking into a little bit, the exact failure condition is > sbp->sb_fdblocks > sbp->sb_dblocks, > > and it seems sbp->sb_fdblocks doesn't decrease as expected when the shrink > size is large (in fact, it's still the number as the origin compared with > correct small shrink size) I'm still looking into what's exactly happening. > Update: the following incremental patch can fix the issue, yet I'm not sure if it's the correct way or not... Thanks, Gao Xiang diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 80927d323939..0a395901bc3f 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -149,6 +149,14 @@ xfs_growfs_data_private( nb - mp->m_sb.sb_dblocks); if (id.nfree) xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree); + + /* + * update in-core counters (especially sb_fdblocks) now + * so xfs_validate_sb_write() can pass. + */ + if (xfs_sb_version_haslazysbcount(&mp->m_sb)) + xfs_log_sb(tp); + xfs_trans_set_sync(tp); error = xfs_trans_commit(tp); if (error)