There is deadlock with calltrace on process 10133: PID 10133 not sceduled for 4403385ms (was on CPU[10]) #0 context_switch() kernel/sched/core.c:3881 #1 __schedule() kernel/sched/core.c:5111 #2 schedule() kernel/sched/core.c:5186 #3 xfs_extent_busy_flush() fs/xfs/xfs_extent_busy.c:598 #4 xfs_alloc_ag_vextent_size() fs/xfs/libxfs/xfs_alloc.c:1641 #5 xfs_alloc_ag_vextent() fs/xfs/libxfs/xfs_alloc.c:828 #6 xfs_alloc_fix_freelist() fs/xfs/libxfs/xfs_alloc.c:2362 #7 xfs_free_extent_fix_freelist() fs/xfs/libxfs/xfs_alloc.c:3029 #8 __xfs_free_extent() fs/xfs/libxfs/xfs_alloc.c:3067 #9 xfs_trans_free_extent() fs/xfs/xfs_extfree_item.c:370 #10 xfs_efi_recover() fs/xfs/xfs_extfree_item.c:626 #11 xlog_recover_process_efi() fs/xfs/xfs_log_recover.c:4605 #12 xlog_recover_process_intents() fs/xfs/xfs_log_recover.c:4893 #13 xlog_recover_finish() fs/xfs/xfs_log_recover.c:5824 #14 xfs_log_mount_finish() fs/xfs/xfs_log.c:764 #15 xfs_mountfs() fs/xfs/xfs_mount.c:978 #16 xfs_fs_fill_super() fs/xfs/xfs_super.c:1908 #17 mount_bdev() fs/super.c:1417 #18 xfs_fs_mount() fs/xfs/xfs_super.c:1985 #19 legacy_get_tree() fs/fs_context.c:647 #20 vfs_get_tree() fs/super.c:1547 #21 do_new_mount() fs/namespace.c:2843 #22 do_mount() fs/namespace.c:3163 #23 ksys_mount() fs/namespace.c:3372 #24 __do_sys_mount() fs/namespace.c:3386 #25 __se_sys_mount() fs/namespace.c:3383 #26 __x64_sys_mount() fs/namespace.c:3383 #27 do_syscall_64() arch/x86/entry/common.c:296 #28 entry_SYSCALL_64() arch/x86/entry/entry_64.S:180 It's waiting xfs_perag.pagb_gen to increase (busy extent clearing happen). >From the vmcore, it's waiting on AG 1. And the ONLY busy extent for AG 1 is with the transaction (in xfs_trans.t_busy) for process 10133. That busy extent is created in a previous EFI with the same transaction. Process 10133 is waiting, it has no change to commit that that transaction. So busy extent clearing can't happen and pagb_gen remain unchanged. So dead lock formed. commit 06058bc40534530e617e5623775c53bb24f032cb disallowed using busy extents for any path that calls xfs_extent_busy_trim(). That looks over-killing. For AGFL block allocation, it just use the first extent that satisfies, it won't try another extent for choose a "better" one. So it's safe to reuse busy extent for AGFL. To fix above dead lock, this patch allows reusing busy extent for AGFL. Signed-off-by: Wengang Wang <wen.gang.wang@xxxxxxxxxx> --- fs/xfs/xfs_extent_busy.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/fs/xfs/xfs_extent_busy.c b/fs/xfs/xfs_extent_busy.c index ef17c1f6db32..f857a5759506 100644 --- a/fs/xfs/xfs_extent_busy.c +++ b/fs/xfs/xfs_extent_busy.c @@ -344,6 +344,7 @@ xfs_extent_busy_trim( ASSERT(*len > 0); spin_lock(&args->pag->pagb_lock); +restart: fbno = *bno; flen = *len; rbp = args->pag->pagb_tree.rb_node; @@ -362,6 +363,20 @@ xfs_extent_busy_trim( continue; } + /* + * AGFL reserving (metadata) is just using the first- + * fit extent, there is no optimization that tries further + * extents. So it's safe to reuse the busy extent and safe + * to update the busy extent. + * Reuse for AGFL even busy extent being discarded. + */ + if (args->resv == XFS_AG_RESV_AGFL) { + if (!xfs_extent_busy_update_extent(args->mp, args->pag, + busyp, fbno, flen, false)) + goto restart; + continue; + } + if (bbno <= fbno) { /* start overlap */ -- 2.21.0 (Apple Git-122.2)