On Thu, Aug 06, 2020 at 01:47:33PM -0400, Jan Stancek wrote: > Hi, > > I'm seeing sporadic mkfs.ext[23] hangs on loop device while running various > LTP tests. It seems to hang indefinitely once in bad state: > 0 D root 29782 29761 0 80 0 - 1006 rq_qos 15:09 ? 00:00:00 mkfs.ext3 /dev/loop0 > > [19809.932566] mkfs.ext3 D 0 29782 29761 0x00000000 > [19809.934000] Call trace: > [19809.934624] __switch_to+0xfc/0x150 > [19809.935533] __schedule+0x364/0x828 > [19809.936432] schedule+0x58/0xe0 > [19809.937261] io_schedule+0x24/0xc0 > [19809.938144] rq_qos_wait+0xe4/0x150 > [19809.939044] wbt_wait+0x98/0xd8 > [19809.939864] __rq_qos_throttle+0x38/0x50 > [19809.940847] blk_mq_submit_bio+0x108/0x620 > [19809.941890] submit_bio_noacct+0x358/0x3d8 > [19809.942909] submit_bio+0x40/0x1a8 > [19809.943770] submit_bh_wbc+0x16c/0x1e8 > [19809.944701] __block_write_full_page+0x238/0x5c8 > [19809.945862] block_write_full_page+0x124/0x138 > [19809.947000] blkdev_writepage+0x24/0x30 > [19809.948031] __writepage+0x28/0xc8 > [19809.948905] write_cache_pages+0x1ac/0x410 > [19809.949988] generic_writepages+0x4c/0x88 > [19809.950947] blkdev_writepages+0x18/0x28 > [19809.951934] do_writepages+0x40/0xe8 > [19809.952856] __filemap_fdatawrite_range+0xe0/0x150 > [19809.954066] file_write_and_wait_range+0x9c/0x108 > [19809.955266] blkdev_fsync+0x24/0x50 > [19809.956170] vfs_fsync_range+0x3c/0x88 > [19809.957126] do_fsync+0x44/0x90 > [19809.957925] __arm64_sys_fsync+0x20/0x30 > [19809.958961] el0_svc_common.constprop.0+0x7c/0x188 > [19809.960242] do_el0_svc+0x2c/0x98 > [19809.961028] el0_sync_handler+0x84/0x110 > [19809.962003] el0_sync+0x15c/0x180 > > It started happening in recent weeks and appears to be aarch64 exclusive so far. > > Affected kernels are at least: > v5.8-475-g382625d0d432 > v5.8-607-gcdc8fcb49905 > v5.8-rc2-87-g6b7b181b67aa > v5.8-rc2-105-g492d76b21566 > > 6b7b181b67aa is the oldest commit I could reproduce it with, but my current > reproducer (running LTP fgetxattr01 in loop for 30 minutes) doesn't look very > reliable for bisect. > > Does this ring any bells? I saw this kind io hang in ltp/fs_fill test reliably and the loop is over image in tmpfs: https://lkml.org/lkml/2020/7/26/77 And I have verified that the following patch can fix the issue: https://lore.kernel.org/linux-block/bc5fa941-3b7c-f28e-dd46-1a1d6e5c40a8@xxxxxxxxx/T/#t Thanks, Ming