On Sun, Oct 02, 2022 at 11:19:48AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > Currently, the only way to lock an allocation group is to hold the AGI > and AGF buffers. If repair needs to roll the transaction while > repairing some AG metadata, it maintains that lock by holding the two > buffers across the transaction roll and joins them afterwards. > > However, repair is not the same as the other parts of XFS that employ > this bhold/bjoin sequence, because it's possible that the AGI or AGF > buffers are not actually dirty before the roll. In this case, the > buffer log item can detach from the buffer, which means that we have to Doesn't this imply we have a reference counting problem with XFS_BLI_HOLD buffers? i.e. the bli can only get detached when the reference count on it goes to zero. If the buffer is clean and joined to a transaction, then that means the only reference to the BLI is the current transaction. Hence the only way it can get detached is for the transaction commit to release the current transaction's reference to the BLI. Ah, XFS_BLI_HOLD does not take a reference to the BLI - it just prevents ->iop_release from releasing the -buffer- after it drops the transaction reference to the BLI. That's the problem right there - xfs_buf_item_release() drops the current trans ref to the clean item via xfs_buf_item_release() regardless of whether BLI_HOLD is set or not, hence freeing the BLI on clean buffers. IOWs, it looks to me like XFS_BLI_HOLD should actually hold a reference to the BLI as well as the buffer so that we don't free the BLI for a held clean buffer in xfs_buf_item_release(). The reference we leave behind will then be picked up by the subsequent call to xfs_trans_bjoin() which finds the clean BLI already attached to the buffer... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx