On Tue, Apr 21, 2020 at 07:08:20PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > On filesystems that support them, bmap intent log items can be used to > change mappings in inode data or attr forks. However, if the bmbt must > expand, the enormous block reservations that we make for finishing > chains of deferred log items become a liability because the bmbt block > allocator sets minleft to the transaction reservation and there probably > aren't any AGs in the filesystem that have that much free space. > > Whereas previously we would reserve 93% of the free blocks in the > filesystem, now we only want to reserve 7/8ths of the free space in the > least full AG, and no more than half of the usable blocks in an AG. In > theory we shouldn't run out of space because (prior to the unclean > shutdown) all of the in-progress transactions successfully reserved the > worst case number of disk blocks. > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > --- > fs/xfs/xfs_log_recover.c | 55 ++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 43 insertions(+), 12 deletions(-) > > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > index e9b3e901d009..a416b028b320 100644 > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -2669,6 +2669,44 @@ xlog_recover_process_data( > return 0; > } > > +/* > + * Estimate a block reservation for a log recovery transaction. Since we run > + * separate transactions for each chain of deferred ops that get created as a > + * result of recovering unfinished log intent items, we must be careful not to > + * reserve so many blocks that block allocations fail because we can't satisfy > + * the minleft requirements (e.g. for bmbt blocks). > + */ > +static int > +xlog_estimate_recovery_resblks( > + struct xfs_mount *mp, > + unsigned int *resblks) > +{ > + struct xfs_perag *pag; > + xfs_agnumber_t agno; > + unsigned int free = 0; > + int error; > + > + /* Don't use more than 7/8th of the free space in the least full AG. */ > + for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) { > + unsigned int ag_free; > + > + error = xfs_alloc_pagf_init(mp, NULL, agno, 0); > + if (error) > + return error; > + pag = xfs_perag_get(mp, agno); > + ag_free = pag->pagf_freeblks + pag->pagf_flcount; > + free = max(free, (ag_free * 7) / 8); > + xfs_perag_put(pag); > + } > + Somewhat unfortunate that we have to iterate all AGs for each chain. I'm wondering if that has any effect on a large recovery on fs' with an inordinate AG count. Have you tested under those particular conditions? I suppose it's possible the recovery is slow enough that this won't matter... Also, perhaps not caused by this patch but does this outsized/manufactured reservation have the effect of artificially steering allocations to a particular AG if one happens to be notably larger than the rest? Brian > + /* Don't try to reserve more than half the usable AG blocks. */ > + *resblks = min(free, xfs_alloc_ag_max_usable(mp) / 2); > + if (*resblks == 0) > + return -ENOSPC; > + > + return 0; > +} > + > /* Take all the collected deferred ops and finish them in order. */ > static int > xlog_finish_defer_ops( > @@ -2677,27 +2715,20 @@ xlog_finish_defer_ops( > { > struct xfs_defer_freezer *dff, *next; > struct xfs_trans *tp; > - int64_t freeblks; > uint resblks; > int error = 0; > > list_for_each_entry_safe(dff, next, dfops_freezers, dff_list) { > + error = xlog_estimate_recovery_resblks(mp, &resblks); > + if (error) > + break; > + > /* > * We're finishing the defer_ops that accumulated as a result > * of recovering unfinished intent items during log recovery. > * We reserve an itruncate transaction because it is the > - * largest permanent transaction type. Since we're the only > - * user of the fs right now, take 93% (15/16) of the available > - * free blocks. Use weird math to avoid a 64-bit division. > + * largest permanent transaction type. > */ > - freeblks = percpu_counter_sum(&mp->m_fdblocks); > - if (freeblks <= 0) { > - error = -ENOSPC; > - break; > - } > - > - resblks = min_t(int64_t, UINT_MAX, freeblks); > - resblks = (resblks * 15) >> 4; > error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks, > 0, XFS_TRANS_RESERVE, &tp); > if (error) >