On Mon, Oct 16, 2023 at 08:37:25PM -0700, Andres Freund wrote: > I just was able to reproduce the issue, after upgrading to 6.6-rc6 - this time > it took ~55min of high load (io_uring using branch of postgres, running a > write heavy transactional workload concurrently with concurrent bulk data > load) to trigger the issue. > > For now I have left the system running, in case there's something you would > like me to check while the system is hung. > > The first hanging task that I observed: > > cat /proc/57606/stack > [<0>] inode_dio_wait+0xd5/0x100 > [<0>] ext4_fallocate+0x12f/0x1040 > [<0>] vfs_fallocate+0x135/0x360 > [<0>] __x64_sys_fallocate+0x42/0x70 > [<0>] do_syscall_64+0x38/0x80 > [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 This stack trace is from some process (presumably postgres) trying to do a fallocate() system call: /* Wait all existing dio workers, newcomers will block on i_rwsem */ inode_dio_wait(inode); The reason for this is that we can't manipulate the extent tree until any data block I/Os comlplete. This will block until iomap_dio_complete() in fs/iomap/direct-io.c calls inode_dio_end(). > [ 3194.579297] INFO: task iou-wrk-58004:58874 blocked for more than 122 seconds. > [ 3194.579304] Not tainted 6.6.0-rc6-andres-00001-g01edcfe38260 #77 > [ 3194.579310] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 3194.579314] task:iou-wrk-58004 state:D stack:0 pid:58874 ppid:52606 flags:0x00004000 > [ 3194.579325] Call Trace: > [ 3194.579329] <TASK> > [ 3194.579334] __schedule+0x388/0x13e0 > [ 3194.579349] schedule+0x5f/0xe0 > [ 3194.579361] schedule_preempt_disabled+0x15/0x20 > [ 3194.579374] rwsem_down_read_slowpath+0x26e/0x4c0 > [ 3194.579385] down_read+0x44/0xa0 > [ 3194.579393] ext4_file_write_iter+0x432/0xa80 > [ 3194.579407] io_write+0x129/0x420 This could potentially be a interesting stack trace; but this is where we really need to map the stack address to line numbers. Is that something you could do? > Once I hear that you don't want me to test something out on the running > system, I think a sensible next step could be to compile with lockdep and see > if that finds a problem? That's certainly a possibiity. But also please make sure that you can compile with with debugging information enabled so that we can get reliable line numbers. I use: CONFIG_DEBUG_INFO=y CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y CONFIG_DEBUG_INFO_REDUCED=y Cheers, - Ted