On Fri 14-02-20 18:24:50, Yang Xu wrote: > on 2020/02/14 5:10, Jan Kara wrote: > > On Thu 13-02-20 16:49:21, Yang Xu wrote: > > > > > When I test generic/269(ext4) on 5.6.0-rc1 kernel, it hangs. > > > > > ---------------------------------------------- > > > > > dmesg as below: > > > > > 76.506753] run fstests generic/269 at 2020-02-11 05:53:44 > > > > > [ 76.955667] EXT4-fs (sdc): mounted filesystem with ordered data mode. > > > > > Opts: acl, user_xattr > > > > > [ 100.912511] device virbr0-nic left promiscuous mode > > > > > [ 100.912520] virbr0: port 1(virbr0-nic) entered disabled state > > > > > [ 246.801561] INFO: task dd:17284 blocked for more than 122 seconds. > > > > > [ 246.801564] Not tainted 5.6.0-rc1 #41 > > > > > [ 246.801565] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > > > > > this mes sage. > > > > > [ 246.801566] dd D 0 17284 16931 0x00000080 > > > > > [ 246.801568] Call Trace: > > > > > [ 246.801584] ? __schedule+0x251/0x690 > > > > > [ 246.801586] schedule+0x40/0xb0 > > > > > [ 246.801588] wb_wait_for_completion+0x52/0x80 > > > > > [ 246.801591] ? finish_wait+0x80/0x80 > > > > > [ 246.801592] __writeback_inodes_sb_nr+0xaa/0xd0 > > > > > [ 246.801593] try_to_writeback_inodes_sb+0x3c/0x50 > > > > > > > > Interesting. Does the hang resolve eventually or the machine is hung > > > > permanently? If the hang is permanent, can you do: > > > > > > > > echo w >/proc/sysrq-trigger > > > > > > > > and send us the stacktraces from dmesg? Thanks! > > > Yes. the hang is permanent, log as below: > full dmesg as attach ... Thanks! So the culprit seems to be: > [ 388.087799] kworker/u12:0 D 0 32 2 0x80004000 > [ 388.087803] Workqueue: writeback wb_workfn (flush-8:32) > [ 388.087805] Call Trace: > [ 388.087810] ? __schedule+0x251/0x690 > [ 388.087811] ? __switch_to_asm+0x34/0x70 > [ 388.087812] ? __switch_to_asm+0x34/0x70 > [ 388.087814] schedule+0x40/0xb0 > [ 388.087816] schedule_timeout+0x20d/0x310 > [ 388.087818] io_schedule_timeout+0x19/0x40 > [ 388.087819] wait_for_completion_io+0x113/0x180 > [ 388.087822] ? wake_up_q+0xa0/0xa0 > [ 388.087824] submit_bio_wait+0x5b/0x80 > [ 388.087827] blkdev_issue_flush+0x81/0xb0 > [ 388.087834] jbd2_cleanup_journal_tail+0x80/0xa0 [jbd2] > [ 388.087837] jbd2_log_do_checkpoint+0xf4/0x3f0 [jbd2] > [ 388.087840] __jbd2_log_wait_for_space+0x66/0x190 [jbd2] > [ 388.087843] ? finish_wait+0x80/0x80 > [ 388.087845] add_transaction_credits+0x27d/0x290 [jbd2] > [ 388.087847] ? blk_mq_make_request+0x289/0x5d0 > [ 388.087849] start_this_handle+0x10a/0x510 [jbd2] > [ 388.087851] ? _cond_resched+0x15/0x30 > [ 388.087853] jbd2__journal_start+0xea/0x1f0 [jbd2] > [ 388.087869] ? ext4_writepages+0x518/0xd90 [ext4] > [ 388.087875] __ext4_journal_start_sb+0x6e/0x130 [ext4] > [ 388.087883] ext4_writepages+0x518/0xd90 [ext4] > [ 388.087886] ? do_writepages+0x41/0xd0 > [ 388.087893] ? ext4_mark_inode_dirty+0x1f0/0x1f0 [ext4] > [ 388.087894] do_writepages+0x41/0xd0 > [ 388.087896] ? snprintf+0x49/0x60 > [ 388.087898] __writeback_single_inode+0x3d/0x340 > [ 388.087899] writeback_sb_inodes+0x1e5/0x480 > [ 388.087901] wb_writeback+0xfb/0x2f0 > [ 388.087902] wb_workfn+0xf0/0x430 > [ 388.087903] ? __switch_to_asm+0x34/0x70 > [ 388.087905] ? finish_task_switch+0x75/0x250 > [ 388.087907] process_one_work+0x1a7/0x370 > [ 388.087909] worker_thread+0x30/0x380 > [ 388.087911] ? process_one_work+0x370/0x370 > [ 388.087912] kthread+0x10c/0x130 > [ 388.087913] ? kthread_park+0x80/0x80 > [ 388.087914] ret_from_fork+0x35/0x40 This process is actually waiting for IO to complete while holding checkpoint_mutex which holds up everybody else. The question is why the IO doesn't complete - that's definitely outside of filesystem. Maybe a bug in the block layer, storage driver, or something like that... What does 'cat /sys/block/<device-with-xfstests>/inflight' show? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR