On Fri, Apr 04, 2014 at 09:37:01AM -0400, Brian Foster wrote: > On Fri, Mar 21, 2014 at 09:11:49PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > When we punch a hole in a delalloc extent, we split the indirect > > block reservation between the two new extents. If we repeatedly > > punch holes in a large delalloc extent, that reservation will > > eventually run out and we'll assert fail in xfs_bunmapi() because > > the indirect block reservation for the delalloc extent is zero. This > > is caused by doing a large delalloc write, then zeroing multiple > > ranges of that write using fallocate to punch lots of holes in the > > delayed allocation range. > > > > To avoid this problem, if we split the reservation and require more > > indirect blocks for the two new extents than we had for the old > > reservation, steal the additional blocks from the hole we punched in > > the extent. In most cases we only need a single extra block, so even > > if we punch only single block holes we can still retain sufficient > > indirect block reservations to avoid problems. > > > > In doing this "stealing", we need to change where we account for the > > delalloc blocks being freed. The block count held on the inode does > > not take into account the indirect block reservation, so we still > > need to do that before we free the extent. However, the accounting > > ofr free space in the superblock need to be done after we've stolen > > the blocks fro the freed extent so that they are accounted for > > correctly. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > fs/xfs/xfs_bmap.c | 65 ++++++++++++++++++++++++++++++++++++++++--------------- > > 1 file changed, 47 insertions(+), 18 deletions(-) > > > > diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c > > index 5b6092e..4bf6a0e 100644 > > --- a/fs/xfs/xfs_bmap.c > > +++ b/fs/xfs/xfs_bmap.c > > @@ -4945,7 +4945,27 @@ xfs_bmap_del_extent( > > temp2 = xfs_bmap_worst_indlen(ip, temp2); > > new.br_startblock = nullstartblock((int)temp2); > > da_new = temp + temp2; > > + > > + /* > > + * Note: if we have an odd number of blocks reserved, > > + * then if we keep splitting the delalloc extent like > > + * this we end up with a delalloc indlen reservation of > > + * zero for one of the two extents. Hence if we end > > + * up with the new indlen reservations being larger than > > + * the old one, steal blocks from the data reservation > > + * we just punched out. Otherwise, just reduce the > > + * remaining indlen reservations alternately and hope > > + * next time we come here the range getting removed is > > + * large enough to fix this all up. > > + */ > > while (da_new > da_old) { > > + if (del->br_blockcount) { > > + /* steal a block */ > > + da_new--; > > + del->br_blockcount--; > > + continue; > > + } > > + > > if (temp) { > > temp--; > > da_new--; > > @@ -5255,24 +5275,6 @@ xfs_bunmapi( > > } > > if (wasdel) { > > ASSERT(startblockval(del.br_startblock) > 0); > > - /* Update realtime/data freespace, unreserve quota */ > > - if (isrt) { > > - xfs_filblks_t rtexts; > > - > > - rtexts = XFS_FSB_TO_B(mp, del.br_blockcount); > > - do_div(rtexts, mp->m_sb.sb_rextsize); > > - xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS, > > - (int64_t)rtexts, 0); > > - (void)xfs_trans_reserve_quota_nblks(NULL, > > - ip, -((long)del.br_blockcount), 0, > > - XFS_QMOPT_RES_RTBLKS); > > - } else { > > - xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS, > > - (int64_t)del.br_blockcount, 0); > > - (void)xfs_trans_reserve_quota_nblks(NULL, > > - ip, -((long)del.br_blockcount), 0, > > - XFS_QMOPT_RES_REGBLKS); > > - } > > ip->i_delayed_blks -= del.br_blockcount; > > if (cur) > > cur->bc_private.b.flags |= > > @@ -5302,6 +5304,33 @@ xfs_bunmapi( > > } > > error = xfs_bmap_del_extent(ip, tp, &lastx, flist, cur, &del, > > &tmp_logflags, whichfork); > > + /* > > + * xfs_bmap_del_extent may hand delayed alloc blocks back to the > > + * indirect block reservations to keep extent split reservations > > + * sane. Hence we should only decrement the delayed block count > > + * on the inode once we know exactly the amount of delalloc > > + * space we actually removed from the inode. > > + */ > > + if (wasdel && del.br_blockcount) { > > + /* Update realtime/data freespace, unreserve quota */ > > + if (isrt) { > > + xfs_filblks_t rtexts; > > + > > + rtexts = XFS_FSB_TO_B(mp, del.br_blockcount); > > + do_div(rtexts, mp->m_sb.sb_rextsize); > > + xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS, > > + (int64_t)rtexts, 0); > > + (void)xfs_trans_reserve_quota_nblks(NULL, > > + ip, -((long)del.br_blockcount), 0, > > + XFS_QMOPT_RES_RTBLKS); > > + } else { > > + xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS, > > + (int64_t)del.br_blockcount, 0); > > + (void)xfs_trans_reserve_quota_nblks(NULL, > > + ip, -((long)del.br_blockcount), 0, > > + XFS_QMOPT_RES_REGBLKS); > > + } > > + } > > In xfs_bmapi_reserve_delalloc(), we account alen against the quota, > calculate indlen, account both against the sb counters, add alen to > i_delayed_blks and continue on... *nod* > In xfs_bunmap(), we remove br_blockcount from the sb counters and > unreserve from the quota then call into xfs_bmap_del_extent(). The > latter takes care of removing the indlen blocks from the sb counters if > necessary. *nod* > With these changes, we move the accounting after the del_extent() call > and allow the latter to steal from br_blockcount for indlen. This seems > like it works for the sb counters. *nod* > We also adjust i_delayed_blks against > the original extent length before it can be modified. The quota > reservation looks like it could become inconsistent, however. E.g., some > blocks previously accounted against the quota (alen) are now moved over > to indlen. Yes, you are right - it will cause inconsistency. That's a lesser evil than not having a reservation at all, but I can probably fix that up. > If those indlen blocks happen to be cleaned up through > del_extent(), they'd only be replenished to the sb counters (near the > end of del_extent()). Perhaps we should leave the quota unreserve prior > to the del_extent() call or sample the original br_blockcount and use it > post del_extent()..? That's a possibility. I really only looked at fixing the reservation issue and keeping the sb counters correct, not the quota aspect of it. I'll go back and see what I can come up with that solves this problem, too. Thanks! Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs