Jens, On 11/9/16 05:21, Jens Axboe wrote: > On 11/08/2016 12:55 PM, Logan Gunthorpe wrote: >> Hey, >> >> I've attached the output of dmesg from a working boot and the output of >> mount. >> >> Pretty much all the file systems are ext4. We have some experimental >> nvme devices in this system which I did try removing to eliminate that >> possibility. >> >> Let me know if you need anything else. > > You're using dm, that might be related. Mike, have you tried booting > for-4.10/block and checking if dm works fine? Using yesterday's tree, I experienced similar problems with for-4.10/block without using dm (using ext4 on top of SSDs): random tasks hung, starting from boot, with the machine eventually completely freezing. I did not dig into the problem a lot. I just looked at task stack traces (echo t > /proc/sysrq-trigger) and noticed that hung tasks are waiting for requests. Ex: [ 55.356418] plymouthd D ffffffff81671758 0 353 1 0x00000000 [ 55.356419] ffff8807fbf1ec00 0000000000000000 ffff8807fba6d500 ffff8807fba3b600 [ 55.356420] ffff88081fb97900 ffff8807f04079a8 ffffffff81671758 000000000000158f [ 55.356421] 0000000000000000 ffff8807f3373800 ffff8807fba3b600 ffff88081fb97900 [ 55.356421] Call Trace: [ 55.356421] [<ffffffff81671758>] ? __schedule+0x178/0x650 [ 55.356422] [<ffffffff81671c70>] schedule+0x40/0x90 [ 55.356423] [<ffffffff816749d1>] schedule_timeout+0x2b1/0x3e0 [ 55.356424] [<ffffffff8115419d>] ? mempool_alloc_slab+0x1d/0x30 [ 55.356425] [<ffffffff810e0971>] ? ktime_get+0x41/0xb0 [ 55.356426] [<ffffffff81671574>] io_schedule_timeout+0xa4/0x110 [ 55.356427] [<ffffffff8130ee2b>] get_request+0x3fb/0x7d0 [ 55.356428] [<ffffffff8120fd83>] ? __find_get_block+0xf3/0x180 [ 55.356429] [<ffffffff810be260>] ? wait_woken+0x90/0x90 [ 55.356431] [<ffffffff813117cb>] blk_queue_bio+0xfb/0x3c0 [ 55.356432] [<ffffffff8130fb90>] generic_make_request+0xd0/0x180 [ 55.356433] [<ffffffff8130fcac>] submit_bio+0x6c/0x130 [ 55.356436] [<ffffffff81270f08>] ext4_io_submit+0x38/0x50 [ 55.356437] [<ffffffff8126c241>] ext4_writepages+0x561/0xdb0 [ 55.356439] [<ffffffff811601e1>] do_writepages+0x21/0x30 [ 55.356440] [<ffffffff811520aa>] __filemap_fdatawrite_range+0xaa/0xf0 [ 55.356440] [<ffffffff811524df>] ? __generic_file_write_iter+0x14f/0x1d0 [ 55.356441] [<ffffffff8115213c>] filemap_flush+0x1c/0x20 [ 55.356442] [<ffffffff812698bc>] ext4_alloc_da_blocks+0x2c/0x80 [ 55.356443] [<ffffffff81262268>] ext4_release_file+0x78/0xc0 [ 55.356446] [<ffffffff811db2a9>] __fput+0xb9/0x200 [ 55.356447] [<ffffffff811db42e>] ____fput+0xe/0x10 [ 55.356449] [<ffffffff81097bf5>] task_work_run+0x85/0xb0 [ 55.356450] [<ffffffff810016a7>] exit_to_usermode_loop+0x97/0xa0 [ 55.356451] [<ffffffff810019e3>] syscall_return_slowpath+0x53/0x60 [ 55.356452] [<ffffffff8167605f>] entry_SYSCALL_64_fastpath+0x92/0x94 I needed the ZBC code so I detached the head back to 5f2808f and everything then worked fine. I will try to bisect. Best regards. -- Damien Le Moal, Ph.D. Sr. Manager, System Software Research Group, Western Digital Corporation Damien.LeMoal@xxxxxxx (+81) 0466-98-3593 (ext. 513593) 1 kirihara-cho, Fujisawa, Kanagawa, 252-0888 Japan www.wdc.com, www.hgst.com -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html