On Thu, Aug 20, 2020 at 01:47:08PM -0400, Brian Foster wrote: > On Thu, Aug 20, 2020 at 10:25:12AM -0700, Darrick J. Wong wrote: > > On Thu, Aug 20, 2020 at 01:07:34PM -0400, Brian Foster wrote: > > > The inode chunk allocation transaction reserves inobt_maxlevels-1 > > > blocks to accommodate a full split of the inode btree. A full split > > > requires an allocation for every existing level and a new root > > > block, which means inobt_maxlevels is the worst case block > > > requirement for a transaction that inserts to the inobt. This can > > > lead to a transaction block reservation overrun when tmpfile > > > creation allocates an inode chunk and expands the inobt to its > > > maximum depth. This problem has been observed in conjunction with > > > overlayfs, which makes frequent use of tmpfiles internally. > > > > > > The existing reservation code goes back as far as the Linux git repo > > > history (v2.6.12). It was likely never observed as a problem because > > > the traditional file/directory creation transactions also include > > > worst case block reservation for directory modifications, which most > > > likely is able to make up for a single block deficiency in the inode > > > allocation portion of the calculation. tmpfile support is relatively > > > more recent (v3.15), less heavily used, and only includes the inode > > > allocation block reservation as tmpfiles aren't linked into the > > > directory tree on creation. > > > > > > Fix up the inode alloc block reservation macro and a couple of the > > > block allocator minleft parameters that enforce an allocation to > > > leave enough free blocks in the AG for a full inobt split. > > > > Looks all fine to me, but... does a similar logic apply to the other > > maxlevels uses in the kernel? > > > > fs/xfs/libxfs/xfs_trans_resv.c:73: blocks = num_ops * 2 * (2 * mp->m_ag_maxlevels - 1); > > fs/xfs/libxfs/xfs_trans_resv.c:75: blocks += max(num_ops * (2 * mp->m_rmap_maxlevels - 1), > > fs/xfs/libxfs/xfs_trans_resv.c:78: blocks += num_ops * (2 * mp->m_refc_maxlevels - 1); > > > > Can we end up in the same kind of situation with those other trees > > {bno,cnt,rmap,refc} where we have a maxlevels-1 tall tree and split each > > level all the way to the top? > > > > Hmm.. it seems so at first glance, but I'm not sure I follow the > calculations in that function. If we factor out the obvious > num_ops/num_trees components, the comment refers to the following > generic formula: > > ((2 blocks/level * max depth) - 1) > > I take it that since this is a log reservation calculation, the two > block/level multiplier is there because we have to move records between > two blocks for each level that splits. Is there a reason the -1 is > applied after that multiplier (as opposed to subtracting 1 from the max > depth first)? I'm wondering if that's intentional and it reflects that > the root level is only one block... Intentional, I think, because that's how btree splits work. :) i.e. split every level into 2 blocks, then add one for the new root. But when the tree is already at max height, we can't split the root block anymore so we are accounting for a split at every level except the root block, which is a single block.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx