On 06 May 2021 at 08:57, Dave Chinner wrote: > On Wed, May 05, 2021 at 06:12:41PM +0530, Chandan Babu R wrote: >> > Hence when doing allocation for the free list, we need to fail the >> > allocation rather than block on the only remaining free extent in >> > the AG. If we are freeing extents, the AGFL not being full is not an >> > issue at all. And if we are allocating extents, the transaction >> > reservations should have ensured that the AG had sufficient space in >> > it to complete the entire operation without deadlocking waiting for >> > space. >> > >> > Either way, I don't see a problem with making sure the AGFL >> > allocations just skip busy extents and fail if the only free extents >> > are ones this transaction has freed itself. >> > >> >> Hmm. In the scenario where *all* free extents in the AG were originally freed >> by the current transaction (and hence busy in the transaction), > > How does that happen? I tried in vain to arrive at the above mentioned scenario by consuming away as many blocks as possible from the filesystem. At best, I could arrive at an AG with just one free extent record in the cntbt (NOTE: I had to disable global reservation by invoking "xfs_io -x -c 'resblks 0' $mntpnt"): recs[1] = [startblock,blockcount] 1:[32767,1] For each AG available in an FS instance, we take away 8 (i.e. XFS_ALLOC_AGFL_RESERVE + 4) blocks from the global free data blocks counter. This reservation is applied to the FS as a whole rather than each AG individually. Hence we could get to a scenario where an AG could have less than 8 free blocks. I could not find any other restriction in the code that explicitly prevents an AG from having zero free extents. However, I could not create such an AG because any fs operation that needs extent allocation to be done would try to reserve more than 1 extent causing the above cited AG to not be chosen. > >> we would need >> to be able to recognize this situation and skip invoking >> xfs_extent_busy_flush() altogether. > > If we are freeing extents (i.e XFS_ALLOC_FLAG_FREEING is set) and > we are doing allocation for AGFL and we only found busy extents, > then it's OK to fail the allocation. When freeing an extent, the following patch skips allocation of blocks to AGFL if all the free extents found are busy, diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index aaa19101bb2a..5310e311d5c6 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -1694,6 +1694,7 @@ xfs_alloc_ag_vextent_size( * are no smaller extents available. */ if (!i) { +alloc_small_extent: error = xfs_alloc_ag_vextent_small(args, cnt_cur, &fbno, &flen, &i); if (error) @@ -1710,6 +1711,9 @@ xfs_alloc_ag_vextent_size( /* * Search for a non-busy extent that is large enough. */ + xfs_agblock_t orig_fbno = NULLAGBLOCK; + xfs_extlen_t orig_flen; + for (;;) { error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i); if (error) @@ -1719,6 +1723,11 @@ xfs_alloc_ag_vextent_size( goto error0; } + if (orig_fbno == NULLAGBLOCK) { + orig_fbno = fbno; + orig_flen = flen; + } + busy = xfs_alloc_compute_aligned(args, fbno, flen, &rbno, &rlen, &busy_gen); @@ -1729,6 +1738,13 @@ xfs_alloc_ag_vextent_size( if (error) goto error0; if (i == 0) { + if (args->freeing_extent) { + error = xfs_alloc_lookup_eq(cnt_cur, + orig_fbno, orig_flen, &i); + ASSERT(!error && i); + goto alloc_small_extent; + } + /* * Our only valid extents must have been busy. * Make it unbusy by forcing the log out and @@ -1819,7 +1835,7 @@ xfs_alloc_ag_vextent_size( */ args->len = rlen; if (rlen < args->minlen) { - if (busy) { + if (busy && !args->freeing_extent) { xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR); trace_xfs_alloc_size_busy(args); xfs_extent_busy_flush(args->mp, args->pag, busy_gen); @@ -2641,6 +2657,7 @@ xfs_alloc_fix_freelist( targs.alignment = targs.minlen = targs.prod = 1; targs.type = XFS_ALLOCTYPE_THIS_AG; targs.pag = pag; + targs.freeing_extent = flags & XFS_ALLOC_FLAG_FREEING; error = xfs_alloc_read_agfl(mp, tp, targs.agno, &agflbp); if (error) goto out_agbp_relse; diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index a4427c5775c2..1e0fc28ef87a 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -78,6 +78,7 @@ typedef struct xfs_alloc_arg { #ifdef DEBUG bool alloc_minlen_only; /* allocate exact minlen extent */ #endif + bool freeing_extent; } xfs_alloc_arg_t; /* With the above patch, xfs/538 cause the following call trace to be printed, XFS (vdc2): Internal error i != 1 at line 3426 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_btree_insert+0x15c/0x1f0 CPU: 2 PID: 1284 Comm: punch-alternati Not tainted 5.12.0-rc8-next-20210419-chandan #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: dump_stack+0x64/0x7c xfs_corruption_error+0x85/0x90 ? xfs_btree_insert+0x15c/0x1f0 xfs_btree_insert+0x18d/0x1f0 ? xfs_btree_insert+0x15c/0x1f0 ? xfs_allocbt_init_common+0x30/0xf0 xfs_free_ag_extent+0x463/0x9d0 __xfs_free_extent+0xe5/0x200 xfs_trans_free_extent+0x3e/0x100 xfs_extent_free_finish_item+0x24/0x40 xfs_defer_finish_noroll+0x1f7/0x5c0 __xfs_trans_commit+0x12f/0x300 xfs_free_file_space+0x1af/0x2c0 xfs_file_fallocate+0x1ca/0x430 ? __cond_resched+0x16/0x40 ? inode_security+0x22/0x60 ? selinux_file_permission+0xe2/0x120 vfs_fallocate+0x146/0x2e0 __x64_sys_fallocate+0x3e/0x70 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae The above call trace occurs during execution of the step #2 listed below, 1. Filling up 90% of the free space of the filesystem. 2. Punch alternate blocks of files. Just before the failure, the filesystem had ~9000 busy extents. So I think we have to flush busy extents even when refilling AGFL for the purpose of freeing an extent. > > We have options here - once we get to the end of the btree and > haven't found a candidate that isn't busy, we could fail > immediately. Or maybe we try an optimisitic flush which forces the > log and waits for as short while (instead of forever) for the > generation to change and then fail if we get a timeout response. Or > maybe there's a more elegant way of doing this that hasn't yet > rattled out of my poor, overloaded brain right now. > > Just because we currently do a blocking flush doesn't mean we always > must do a blocking flush.... I will try to work out a solution. -- chandan