> Was this the 'ONLY' dmsetup in your listing (i.e. you reproduced case > again)? This was the original instance of the problem. Today I have rebooted and reproduced the problem on a fresh kernel. > I mean - your existing reported situation was already hopeless and > needed reboot - as if flushing suspend holds some mutexes - no other > suspend call can fix it -> you usually have just 1 chance to fix it > in right way, if you go wrong way reboot is unavoidable. That sounds like a very unforgiving buggy kernel, if you only have one chance to fix the problem ;-) Here is my attempt on the fresh kernel. I received some write errors in dmesg, so tried to umount the dm device to confirm I had reproduced the problem, and when umount failed to exit I tried this: $ dmsetup reload backup --table "0 11720531968 error" $ dmsetup suspend --noflush --nolockfs backup These two worked fine now. "dmsetup suspend" was locking up before, this time it worked. $ umount /mnt/backup umount: /mnt/backup: not mounted The dm instance is no longer mounted. $ mdadm --manage --stop /dev/md10 mdadm: Cannot get exclusive access to /dev/md10:Perhaps a running process, mounted filesystem or active volume group? I can't restart the underlying RAID array though, as the dm instance is still holding onto the devices. $ dmsetup remove --force backup device-mapper: remove ioctl on backup failed: Device or resource busy Command failed I don't appear to be able to shut down the dm device either. I tried to umount the device before any of this, and the umount process has frozen (despite it seeming to have unmounted successfully), so this is probably what the kernel thinks is using the device. Although the table has been replace by the "error" target, the umount process is not returning and appears to be frozen inside the kernel (because killall -9 doesn't work.) Strangely I can still read and write to the underlying device (/dev/md10), it is only processes accessing /dev/mapper/backup that freeze. Any suggestions? I imagine "dmsetup remove --deferred" won't help if the umount process is holding the device open and never terminates. It still looks like once you get an I/O error, the dm device locks up and a reboot is the only way to get it to let go of the storage device underlying the dm device. Not sure if this helps, but this is where 'sync' and 'umount' lock up when the system is in this state: sync D ffff880121ff7e00 0 23685 23671 0x00000004 ffff880121ff7e00 ffff88040d7a28c0 ffff88040b498a30 dead000000100100 ffff880121ff8000 ffff8800d96ff068 ffff8800d96ff080 ffffffff81213800 ffff8800d96ff068 ffff880121ff7e20 ffffffff81588377 ffff880037bbc068 Call Trace: [<ffffffff81213800>] ? SyS_tee+0x400/0x400 [<ffffffff81588377>] schedule+0x37/0x90 [<ffffffff8158a805>] rwsem_down_read_failed+0xd5/0x120 [<ffffffff8120cf64>] ? sync_inodes_sb+0x184/0x1e0 [<ffffffff812d7b24>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff8158a1d7>] ? down_read+0x17/0x20 [<ffffffff811e49d4>] iterate_supers+0xa4/0x120 [<ffffffff81213b94>] sys_sync+0x44/0xb0 [<ffffffff8158bfae>] system_call_fastpath+0x12/0x71 umount R running task 0 23669 18676 0x00000004 00000000000000cb ffff880108607d78 0000000000000000 000000000000020e 0000000000000000 0000000000000000 ffff880108604000 ffff88040b49dbb0 00000000000000e4 00000000000000ff 0000000000000000 ffff8800d972b800 Call Trace: [<ffffffffa00d13a9>] ? jbd2_log_do_checkpoint+0x19/0x4b0 [jbd2] [<ffffffffa00d13bd>] ? jbd2_log_do_checkpoint+0x2d/0x4b0 [jbd2] [<ffffffffa00d6520>] ? jbd2_journal_destroy+0x140/0x240 [jbd2] [<ffffffff810bc720>] ? wake_atomic_t_function+0x60/0x60 [<ffffffffa019f6d7>] ? ext4_put_super+0x67/0x360 [ext4] [<ffffffff811e3216>] ? generic_shutdown_super+0x76/0x100 [<ffffffff811e35d7>] ? kill_block_super+0x27/0x80 [<ffffffff811e3949>] ? deactivate_locked_super+0x49/0x80 [<ffffffff811e3dbc>] ? deactivate_super+0x6c/0x80 [<ffffffff81201863>] ? cleanup_mnt+0x43/0xa0 [<ffffffff81201912>] ? __cleanup_mnt+0x12/0x20 [<ffffffff81095c54>] ? task_work_run+0xd4/0xf0 [<ffffffff81015d25>] ? do_notify_resume+0x75/0x80 [<ffffffff8158c17c>] ? int_signal+0x12/0x17 Looks like umount might be stuck in an infinite loop, when I run another trace, it's slightly different: umount R running task 0 23669 18676 0x00000004 ffffffffffffff02 ffffffffa00d0f2e 0000000000000010 0000000000000292 ffff880108607c98 0000000000000018 0000000000000000 ffff8800d972b800 ffffffffffffff02 ffffffffa00d13c0 00000000f8eef941 0000000000000296 Call Trace: [<ffffffffa00d0f2e>] ? jbd2_cleanup_journal_tail+0xe/0xb0 [jbd2] [<ffffffffa00d13c0>] ? jbd2_log_do_checkpoint+0x30/0x4b0 [jbd2] [<ffffffffa00d13bd>] ? jbd2_log_do_checkpoint+0x2d/0x4b0 [jbd2] [<ffffffffa00d6518>] ? jbd2_journal_destroy+0x138/0x240 [jbd2] [<ffffffff810bc720>] ? wake_atomic_t_function+0x60/0x60 [<ffffffffa019f6d7>] ? ext4_put_super+0x67/0x360 [ext4] [<ffffffff811e3216>] ? generic_shutdown_super+0x76/0x100 [<ffffffff811e35d7>] ? kill_block_super+0x27/0x80 [<ffffffff811e3949>] ? deactivate_locked_super+0x49/0x80 [<ffffffff811e3dbc>] ? deactivate_super+0x6c/0x80 [<ffffffff81201863>] ? cleanup_mnt+0x43/0xa0 [<ffffffff81201912>] ? __cleanup_mnt+0x12/0x20 [<ffffffff81095c54>] ? task_work_run+0xd4/0xf0 [<ffffffff81015d25>] ? do_notify_resume+0x75/0x80 [<ffffffff8158c17c>] ? int_signal+0x12/0x17 Thanks, Adam. -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel