Hello, tl;dr: it seems like fsck fails to notice when the block device disappears from under it. I have the following setup: * external USB disk * a partition with LUKS in it * ext4 filesystem inside the LUKS block device While doing backups to it I noticed that after some time backups would fail with an error (failed to write, ). In the following log I'm attaching the disk, enabling LUKS and mounting the disk and then starting the backup, which will fail after a while: May 25 11:39:18 hier kernel: [76241.727367] usb 4-1.1: new high-speed USB device number 6 using ehci-pci May 25 11:39:19 hier kernel: [76241.881892] usb 4-1.1: New USB device found, idVendor=0bc2, idProduct=3320 May 25 11:39:19 hier kernel: [76241.881902] usb 4-1.1: New USB device strings: Mfr=2, Product=3, SerialNumber=1 May 25 11:39:19 hier kernel: [76241.881907] usb 4-1.1: Product: Expansion Desk May 25 11:39:19 hier kernel: [76241.881911] usb 4-1.1: Manufacturer: Seagate May 25 11:39:19 hier kernel: [76241.881915] usb 4-1.1: SerialNumber: NA4JDN4N May 25 11:39:19 hier kernel: [76241.882652] usb-storage 4-1.1:1.0: USB Mass Storage device detected May 25 11:39:19 hier kernel: [76241.883007] scsi7 : usb-storage 4-1.1:1.0 May 25 11:39:20 hier kernel: [76242.907527] usb 4-1.1: USB disconnect, device number 6 May 25 11:39:30 hier kernel: [76252.987833] usb 4-1.1: new high-speed USB device number 7 using ehci-pci May 25 11:39:30 hier kernel: [76253.142659] usb 4-1.1: New USB device found, idVendor=0bc2, idProduct=3320 May 25 11:39:30 hier kernel: [76253.142669] usb 4-1.1: New USB device strings: Mfr=2, Product=3, SerialNumber=1 May 25 11:39:30 hier kernel: [76253.142674] usb 4-1.1: Product: Expansion Desk May 25 11:39:30 hier kernel: [76253.142678] usb 4-1.1: Manufacturer: Seagate May 25 11:39:30 hier kernel: [76253.142681] usb 4-1.1: SerialNumber: NA4JDN4N May 25 11:39:30 hier kernel: [76253.143465] usb-storage 4-1.1:1.0: USB Mass Storage device detected May 25 11:39:30 hier kernel: [76253.144285] scsi8 : usb-storage 4-1.1:1.0 May 25 11:39:31 hier kernel: [76254.144816] scsi 8:0:0:0: Direct-Access Seagate Expansion Desk 070B PQ: 0 ANSI: 6 May 25 11:39:31 hier kernel: [76254.145528] sd 8:0:0:0: Attached scsi generic sg1 type 0 May 25 11:39:31 hier kernel: [76254.146193] sd 8:0:0:0: [sdb] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) May 25 11:39:31 hier kernel: [76254.147465] sd 8:0:0:0: [sdb] Write Protect is off May 25 11:39:31 hier kernel: [76254.147477] sd 8:0:0:0: [sdb] Mode Sense: 43 00 00 00 May 25 11:39:31 hier kernel: [76254.148657] sd 8:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 25 11:39:31 hier kernel: [76254.149907] sd 8:0:0:0: [sdb] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) May 25 11:39:31 hier kernel: [76254.179482] sdb: sdb1 sdb2 May 25 11:39:31 hier kernel: [76254.181592] sd 8:0:0:0: [sdb] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) May 25 11:39:31 hier kernel: [76254.184058] sd 8:0:0:0: [sdb] Attached SCSI disk May 25 11:39:40 hier kernel: [76263.066639] device-mapper: uevent: version 1.0.3 May 25 11:39:40 hier kernel: [76263.066841] device-mapper: ioctl: 4.27.0-ioctl (2013-10-30) initialised: dm-devel@xxxxxxxxxx May 25 11:39:40 hier kernel: [76263.706456] NET: Registered protocol family 38 May 25 11:39:41 hier kernel: [76263.976836] sha256_ssse3: Using AVX optimized SHA-256 implementation May 25 11:39:45 hier kernel: [76268.130879] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null) May 25 12:39:32 hier kernel: [79854.051465] usb 4-1.1: USB disconnect, device number 7 May 25 12:39:32 hier kernel: [79854.052460] sd 8:0:0:0: [sdb] Synchronizing SCSI cache May 25 12:39:32 hier kernel: [79854.052634] sd 8:0:0:0: [sdb] May 25 12:39:32 hier kernel: [79854.052636] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK May 25 12:39:32 hier kernel: [79854.250864] usb 4-1.1: new high-speed USB device number 8 using ehci-pci May 25 12:39:32 hier kernel: [79854.405885] usb 4-1.1: New USB device found, idVendor=0bc2, idProduct=3320 May 25 12:39:32 hier kernel: [79854.405895] usb 4-1.1: New USB device strings: Mfr=2, Product=3, SerialNumber=1 May 25 12:39:32 hier kernel: [79854.405900] usb 4-1.1: Product: Expansion Desk May 25 12:39:32 hier kernel: [79854.405904] usb 4-1.1: Manufacturer: Seagate May 25 12:39:32 hier kernel: [79854.405908] usb 4-1.1: SerialNumber: NA4JDN4N May 25 12:39:32 hier kernel: [79854.406710] usb-storage 4-1.1:1.0: USB Mass Storage device detected May 25 12:39:32 hier kernel: [79854.407068] scsi9 : usb-storage 4-1.1:1.0 May 25 12:39:32 hier kernel: [79854.655950] Buffer I/O error on device dm-0, logical block 464117312 May 25 12:39:32 hier kernel: [79854.655959] Buffer I/O error on device dm-0, logical block 464117312 May 25 12:39:33 hier kernel: [79855.407979] scsi 9:0:0:0: Direct-Access Seagate Expansion Desk 070B PQ: 0 ANSI: 6 May 25 12:39:33 hier kernel: [79855.408847] sd 9:0:0:0: Attached scsi generic sg1 type 0 May 25 12:39:33 hier kernel: [79855.410055] sd 9:0:0:0: [sdc] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) May 25 12:39:33 hier kernel: [79855.411191] sd 9:0:0:0: [sdc] Write Protect is off May 25 12:39:33 hier kernel: [79855.411202] sd 9:0:0:0: [sdc] Mode Sense: 43 00 00 00 May 25 12:39:33 hier kernel: [79855.412402] sd 9:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 25 12:39:33 hier kernel: [79855.413245] sd 9:0:0:0: [sdc] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) May 25 12:39:37 hier kernel: [79858.940027] sdc: sdc1 sdc2 May 25 12:39:37 hier kernel: [79858.942029] sd 9:0:0:0: [sdc] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) May 25 12:39:37 hier kernel: [79858.944241] sd 9:0:0:0: [sdc] Attached SCSI disk May 25 12:39:51 hier kernel: [79872.773254] Buffer I/O error on device dm-0, logical block 17 May 25 12:39:51 hier kernel: [79872.773259] lost page write due to I/O error on dm-0 May 25 12:39:51 hier kernel: [79872.773295] Buffer I/O error on device dm-0, logical block 68681744 May 25 12:39:51 hier kernel: [79872.773297] lost page write due to I/O error on dm-0 May 25 12:39:51 hier kernel: [79872.773327] Buffer I/O error on device dm-0, logical block 68681774 May 25 12:39:51 hier kernel: [79872.773328] lost page write due to I/O error on dm-0 May 25 12:39:51 hier kernel: [79872.773359] Buffer I/O error on device dm-0, logical block 68681983 May 25 12:39:51 hier kernel: [79872.773360] lost page write due to I/O error on dm-0 May 25 12:39:51 hier kernel: [79872.773390] Buffer I/O error on device dm-0, logical block 68690183 May 25 12:39:51 hier kernel: [79872.773391] lost page write due to I/O error on dm-0 May 25 12:39:51 hier kernel: [79872.773421] Buffer I/O error on device dm-0, logical block 71303343 May 25 12:39:51 hier kernel: [79872.773422] lost page write due to I/O error on dm-0 May 25 12:39:51 hier kernel: [79872.773452] Buffer I/O error on device dm-0, logical block 72876154 May 25 12:39:51 hier kernel: [79872.773453] lost page write due to I/O error on dm-0 May 25 12:39:51 hier kernel: [79872.773483] Buffer I/O error on device dm-0, logical block 77070492 May 25 12:39:51 hier kernel: [79872.773485] lost page write due to I/O error on dm-0 May 25 12:39:51 hier kernel: [79872.773514] Buffer I/O error on device dm-0, logical block 98566225 May 25 12:39:51 hier kernel: [79872.773515] lost page write due to I/O error on dm-0 May 25 12:39:51 hier kernel: [79872.773558] Aborting journal on device dm-0-8. May 25 12:39:51 hier kernel: [79872.773588] Buffer I/O error on device dm-0, logical block 231768064 May 25 12:39:51 hier kernel: [79872.773590] lost page write due to I/O error on dm-0 May 25 12:39:51 hier kernel: [79872.773600] JBD2: Error -5 detected when updating journal superblock for dm-0-8. May 25 12:39:51 hier kernel: [79872.773817] EXT4-fs warning (device dm-0): ext4_end_bio:317: I/O error -19 writing to inode 17173338 (offset 2490368 size 131072 starting block 148009568) May 25 12:39:51 hier kernel: [79872.773821] Buffer I/O error on device dm-0, logical block 148009568 May 25 12:39:51 hier kernel: [79872.773827] Buffer I/O error on device dm-0, logical block 148009569 May 25 12:39:51 hier kernel: [79872.773829] Buffer I/O error on device dm-0, logical block 148009570 May 25 12:39:51 hier kernel: [79872.773830] Buffer I/O error on device dm-0, logical block 148009571 May 25 12:39:51 hier kernel: [79872.773832] Buffer I/O error on device dm-0, logical block 148009572 May 25 12:39:51 hier kernel: [79872.773834] Buffer I/O error on device dm-0, logical block 148009573 May 25 12:39:51 hier kernel: [79872.773836] Buffer I/O error on device dm-0, logical block 148009574 May 25 12:39:51 hier kernel: [79872.773838] Buffer I/O error on device dm-0, logical block 148009575 May 25 12:39:51 hier kernel: [79872.773840] Buffer I/O error on device dm-0, logical block 148009576 May 25 12:39:51 hier kernel: [79872.773841] Buffer I/O error on device dm-0, logical block 148009577 May 25 12:39:51 hier kernel: [79872.773898] EXT4-fs warning (device dm-0): ext4_end_bio:317: I/O error -19 writing to inode 17173338 (offset 2490368 size 131072 starting block 148009598) May 25 12:39:51 hier kernel: [79872.773948] EXT4-fs warning (device dm-0): ext4_end_bio:317: I/O error -19 writing to inode 40770054 (offset 3145728 size 8192 starting block 147922176) May 25 12:39:51 hier kernel: [79872.773975] EXT4-fs warning (device dm-0): ext4_end_bio:317: I/O error -19 writing to inode 40770076 (offset 3887104 size 4096 starting block 5281717) May 25 12:39:51 hier kernel: [79873.230984] EXT4-fs error (device dm-0): ext4_put_super:790: Couldn't clean up the journal May 25 12:39:51 hier kernel: [79873.230990] EXT4-fs (dm-0): Remounting filesystem read-only May 25 12:39:51 hier kernel: [79873.230992] EXT4-fs (dm-0): previous I/O error to superblock detected May 25 12:44:07 hier kernel: [80128.859704] EXT4-fs (dm-0): recovery complete May 25 12:44:07 hier kernel: [80128.878030] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null) Which layer is saying "Buffer I/O error on device dm-0, logical block 0" ? Is it the LUKS module? After one more round of trying to do a backup and failing I decided to fsck the partition (the partition was unlocked via LUKS then): # fsck.ext4 -c -c -k /dev/mapper/luks-446c3a20-eb34-45f6-9ff9-d4a5b17fdedd That gave me after a while of running without error reports a never ending series of fsck wanting me to confirm its actions. Stuff like: die Block-Bitmap (24117248) von Gruppe 736 ist ungültig. Zurücksetzen<j>? ja Since I mount that disk with `gnome-disk` I noticed there, that the lock on the LUKS encrypted parition had in the meantime become *closed* again, without me doing anything. Nevertheless fsck was happily continuing with its disk check. So I think there are a few parts broken in this chain of layers. The one that I can put a finger on is that fsck should notice or should be notified when the block device under it ceases to exist, as is the case when the LUKS device becomes locked again. I'm not sure why fsck doesn't notice. Doesn't it get the right information from the LUKS block device? The end result of this is, that my backups are lost. The disk can still be read, but LUKS is no more able to decipher it. I'm not sure how to go forward from here. I know that I had a series of problems with external USB drives allready: * I've had to have my USB port re-soldered (one year ago) because contact with attached devices had become unreliable - they would disapear and reappear in the kernel log. After having been resoldered, the port was reliable. * ca. 7 months ago I had to throw away an external USB drive because again it would be disappearing in `gnome-disk` after a few minutes of doings backups onto it (same setup as today) The current drive and the current re-soldered port were working reliably since then though. So I'm not sure which parts of the chain are broken now. However I think that: 1. the information of "the device is gone" or "has problems" should be logged by every layer from the bottom up to the top However I am only able to see the USB layer do that: "usb 4-1.1: USB disconnect, device number 7" I can't see any log message from the usb-storage that would let me know that the device has disappeared. Neither can I see LUKS telling me that it is relocking the device (which `gnome-disk` is telling me it did). Nor can I see ext4 telling me that the block device under it is gone. 2. the information of "the device is gone" should be passed from the first layer that has that problem up through all following layers. Why doesn't this apparently happen? What's the way forward to fix this? *t _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users