On Tue, Nov 21, 2017 at 12:35:57PM -0500, Brian Foster wrote: > cc linux-xfs > > On Tue, Nov 21, 2017 at 10:51:10AM -0600, Mark Tinguely wrote: > > STATIC uint > > xfs_calc_ifree_reservation( > > struct xfs_mount *mp) > > { > > return XFS_DQUOT_LOGRES(mp) + > > mp->m_sb.sb_inodesize + > > mp->m_sb.sb_sectsize + > > mp->m_sb.sb_sectsize + > > XFS_FSB_TO_B(mp, 1) + > > MAX((__uint16_t)XFS_FSB_TO_B(mp, 1), > > XFS_INODE_CLUSTER_SIZE(mp)) + > > 128 * 5 + > > XFS_ALLOCFREE_LOG_RES(mp, 1) + > > 128 * (2 + XFS_IALLOC_BLOCKS(mp) + mp->m_in_maxlevels + > > XFS_ALLOCFREE_LOG_COUNT(mp, 1)); > > } > > ... > > Indeed, this looks fairly similar to the line cited above from the > current code (modulo the xfs_calc_buf_res() refactoring). Right, the refactoring should have been a straight conversion of the code - and not change any of the reservations. If they were wrong before the refactoring, they will still be wrong now. > I'm curious > why we only consider the buffer overhead (i.e., 128 bytes * > ->m_in_maxlevels) of the ->m_in_maxlevels value rather than the full > buffer size. That just looks like a bug. Earlier on in the calculation we take into account: * the inode btree entry: block size .... xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, 1)) + So we've accounted for the AGI btree block the record being modified exists in. But we don't account for tree shape changes.... .... and, as usual, knowing the history of the code tells me the probable reason why the reservation is like this. i.e. inodes could not be removed from disk when the original reservation was defined. Once allocated, they never got removed, so an ifree transaction only ever touched the inode btree block the chunk record was in. So, a quick dive into history from the XFS archive with git blame: 814994f5c75f6 (Adam Sweeney 1994-09-23 * In freeing an inode we can modify: 814994f5c75f6 (Adam Sweeney 1994-09-23 * the inode being freed: inode size 814994f5c75f6 (Adam Sweeney 1994-09-23 * the super block free inode counter: sector size 814994f5c75f6 (Adam Sweeney 1994-09-23 * the agi hash list and counters: sector size 814994f5c75f6 (Adam Sweeney 1994-09-23 * the inode btree entry: block size cd235688e046e (Adam Sweeney 1995-08-31 * the on disk inode before ours in the agi hash list: inode cluster size 254f6311ed1b7 (Steve Lord 2003-10-06 * the inode btree: max depth * blocksize 254f6311ed1b7 (Steve Lord 2003-10-06 * the allocation btrees: 2 trees * (max depth - 1) * block size 814994f5c75f6 (Adam Sweeney 1994-09-23 */ d3c15dd08b409 (Russell Cattelan 2003-04-15 #define XFS_CALC_IFREE_LOG_RES(mp) \ 814994f5c75f6 (Adam Sweeney 1994-09-23 ((mp)->m_sb.sb_inodesize + \ 814994f5c75f6 (Adam Sweeney 1994-09-23 (mp)->m_sb.sb_sectsize + \ 814994f5c75f6 (Adam Sweeney 1994-09-23 (mp)->m_sb.sb_sectsize + \ 814994f5c75f6 (Adam Sweeney 1994-09-23 XFS_FSB_TO_B((mp), 1) + \ 0139225509eb1 (Steve Lord 2001-09-11 MAX((__uint16_t)XFS_FSB_TO_B((mp), 1), XFS_INODE_CLUSTER_SIZE(mp)) + \ 254f6311ed1b7 (Steve Lord 2003-10-06 (128 * 5) + \ 254f6311ed1b7 (Steve Lord 2003-10-06 (128 * (2 + XFS_IALLOC_BLOCKS(mp) + XFS_IN_MAXLEVELS(mp) + \ 254f6311ed1b7 (Steve Lord 2003-10-06 XFS_ALLOCFREE_LOG_COUNT(mp, 1)))) 254f6311ed1b7 (Steve Lord 2003-10-06 Without even looking I know what those 2003 changes were. For everyone else: 254f6311ed1b Implement deletion of inode clusters in XFS. So, yeah, this looks like a bug in the original reservation calculation for freeing inode clusters. I suspect it got messed up because the buffers covering XFS_IALLOC_BLOCKS (the inode chunk) get marked stale and so their contents are not logged. That means just the header+blf is logged and so the reservation is correct for them. that isn't the case for the inobt blocks themselves, however. i.e. it should be calculating: .... * the inode chunk blocks are stale, so only headers get logged * the inode btree: max depth * blocksize .... xfs_calc_buf_res(2 + mp->m_ialloc_blks, 0) + xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) + .... I'm still not 100% sure what the extra "2 +" is accounting for there, though, so to be safe that migh need to be accounted as full blocks, not log headers. Good find, Brian! Of course, that doesn't mean it will fix the transaction overrun being reported because it's overrunning the freespace tree reservations rather than the inobt reservation. It certainly won't hurt, though. :P Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html