> > I have root caused the bug and hope to post the patches soon. Sorry, I had forgotten about this bug. Luis reminded me about this recently. The patches I had written turned out to be incorrect. However, the following provides the root cause analysis of the bug. Executing xfs/538 on a Linux v6.6 kernel can lead to the following deadlock, |------------------+------------------+------------------| | Task A | Task B | Task C | |------------------+------------------+------------------| | Lock AG 1's AGF | | | | | Lock AG 2's AGI | | | | Wait for lock on | | | | AG 1's AGF | | | | | Lock AG 3's AGF | | | | Wait for lock on | | | | AG 2's AGI | | Wait for lock on | | | | AG 3's AGF | | | |------------------+------------------+------------------| As illustrated above, Task B and C are violating the AG locking order rule i.e. AGI/AGF must be locked in increasing AG order and that within an AG, AGI must be locked before an AGF. Task B's call trace: context_switch (kernel/sched/core.c:5382:2) __schedule (kernel/sched/core.c:6695:8) schedule (kernel/sched/core.c:6771:3) schedule_timeout (kernel/time/timer.c:2143:3) ___down_common (kernel/locking/semaphore.c:225:13) __down_common (kernel/locking/semaphore.c:246:8) down (kernel/locking/semaphore.c:63:3) xfs_buf_lock (fs/xfs/xfs_buf.c:1126:2) xfs_buf_find_lock (fs/xfs/xfs_buf.c:553:3) xfs_buf_lookup (fs/xfs/xfs_buf.c:592:10) xfs_buf_get_map (fs/xfs/xfs_buf.c:702:10) xfs_buf_read_map (fs/xfs/xfs_buf.c:817:10) xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:289:10) xfs_trans_read_buf (./fs/xfs/xfs_trans.h:212:9) xfs_read_agf (fs/xfs/libxfs/xfs_alloc.c:3153:10) xfs_alloc_read_agf (fs/xfs/libxfs/xfs_alloc.c:3185:10) xfs_alloc_fix_freelist (fs/xfs/libxfs/xfs_alloc.c:2658:11) xfs_alloc_vextent_prepare_ag (fs/xfs/libxfs/xfs_alloc.c:3321:10) xfs_alloc_vextent_iterate_ags (fs/xfs/libxfs/xfs_alloc.c:3506:11) xfs_alloc_vextent_first_ag (fs/xfs/libxfs/xfs_alloc.c:3641:10) xfs_bmap_exact_minlen_extent_alloc (fs/xfs/libxfs/xfs_bmap.c:3434:10) xfs_bmap_alloc_userdata (fs/xfs/libxfs/xfs_bmap.c:4084:10) xfs_bmapi_allocate (fs/xfs/libxfs/xfs_bmap.c:4129:11) xfs_bmapi_write (fs/xfs/libxfs/xfs_bmap.c:4438:12) xfs_symlink (fs/xfs/xfs_symlink.c:271:11) xfs_vn_symlink (fs/xfs/xfs_iops.c:419:10) vfs_symlink (fs/namei.c:4480:10) vfs_symlink (fs/namei.c:4464:5) do_symlinkat (fs/namei.c:4506:11) __do_sys_symlink (fs/namei.c:4527:9) __se_sys_symlink (fs/namei.c:4525:1) __x64_sys_symlink (fs/namei.c:4525:1) do_syscall_x64 (arch/x86/entry/common.c:50:14) do_syscall_64 (arch/x86/entry/common.c:80:7) entry_SYSCALL_64+0xaa/0x1a6 (arch/x86/entry/entry_64.S:120) Task B above locked AG 2's AGI, allocated an ondisk inode, then tried to allocate blocks (required for holding pathname representing the symbolic link) from AG 1. This happened due to xfs_bmap_exact_minlen_extent_alloc() iterating across AGs starting from AG 0. Task C's call trace: context_switch (kernel/sched/core.c:5382:2) __schedule (kernel/sched/core.c:6695:8) schedule (kernel/sched/core.c:6771:3) schedule_timeout (kernel/time/timer.c:2143:3) ___down_common (kernel/locking/semaphore.c:225:13) __down_common (kernel/locking/semaphore.c:246:8) down (kernel/locking/semaphore.c:63:3) xfs_buf_lock (fs/xfs/xfs_buf.c:1126:2) xfs_buf_find_lock (fs/xfs/xfs_buf.c:553:3) xfs_buf_lookup (fs/xfs/xfs_buf.c:592:10) xfs_buf_get_map (fs/xfs/xfs_buf.c:702:10) xfs_buf_read_map (fs/xfs/xfs_buf.c:817:10) xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:289:10) xfs_trans_read_buf (./fs/xfs/xfs_trans.h:212:9) xfs_read_agi (fs/xfs/libxfs/xfs_ialloc.c:2598:10) xfs_ialloc_read_agi (fs/xfs/libxfs/xfs_ialloc.c:2626:10) xfs_dialloc_try_ag (fs/xfs/libxfs/xfs_ialloc.c:1690:10) xfs_dialloc (fs/xfs/libxfs/xfs_ialloc.c:1803:12) xfs_create (fs/xfs/xfs_inode.c:1020:10) xfs_generic_create (fs/xfs/xfs_iops.c:199:11) vfs_mkdir (fs/namei.c:4120:10) do_mkdirat (fs/namei.c:4143:11) __do_sys_mkdir (fs/namei.c:4163:9) __se_sys_mkdir (fs/namei.c:4161:1) __x64_sys_mkdir (fs/namei.c:4161:1) do_syscall_x64 (arch/x86/entry/common.c:50:14) do_syscall_64 (arch/x86/entry/common.c:80:7) entry_SYSCALL_64+0xaa/0x1a6 (arch/x86/entry/entry_64.S:120) Task C above was trying to allocate an inode chunk to serve a mkdir() syscall request. Task C locked AG 3's AGF and searched for the required extent. However, the only suitable extent was found to be straddling xfs_alloc_arg->max_agbno. Hence, it moved to the next AG and ended up wrapping around the AG list. -- Chandan