Wednesday, July 10, 2019, 5:23:41 PM, you wrote: > On 7/10/19 8:58 AM, Andrey Zhunev wrote: >> Wednesday, July 10, 2019, 4:26:14 PM, you wrote: >> >>> On 7/10/19 4:56 AM, Andrey Zhunev wrote: >>>> Hello All, >>>> >>>> I am struggling to recover my system after a PSU failure, and I was >>>> suggested to ask here for support. >>>> >>>> One of the hard drives throws some read errors, and that happen to be >>>> my root drive... >>>> My system is CentOS 7, and the root partition is a part of LVM. >>>> >>>> [root@mgmt ~]# lvscan >>>> ACTIVE '/dev/centos/root' [<98.83 GiB] inherit >>>> ACTIVE '/dev/centos/home' [<638.31 GiB] inherit >>>> ACTIVE '/dev/centos/swap' [<7.52 GiB] inherit >>>> [root@mgmt ~]# >>>> >>>> [root@tftp ~]# file -s /dev/centos/root >>>> /dev/centos/root: symbolic link to `../dm-3' >>>> [root@tftp ~]# file -s /dev/centos/home >>>> /dev/centos/home: symbolic link to `../dm-4' >>>> [root@tftp ~]# file -s /dev/dm-3 >>>> /dev/dm-3: SGI XFS filesystem data (blksz 4096, inosz 256, v2 dirs) >>>> [root@tftp ~]# file -s /dev/dm-4 >>>> /dev/dm-4: SGI XFS filesystem data (blksz 4096, inosz 256, v2 dirs) >>>> >>>> >>>> [root@tftp ~]# xfs_repair /dev/centos/root >>>> Phase 1 - find and verify superblock... >>>> superblock read failed, offset 53057945600, size 131072, ag 2, rval -1 >>>> >>>> fatal error -- Input/output error >> >>> look at dmesg, see what the kernel says about the read failure. >> >>> You might be able to use https://www.gnu.org/software/ddrescue/ >>> to read as many sectors off the device into an image file as possible, >>> and that image might be enough to work with for recovery. That would be >>> my first approach: >> >>> 1) use dd-rescue to create an image file of the device >>> 2) make a copy of that image file >>> 3) run xfs_repair -n on the copy to see what it would do >>> 4) if that looks reasonable run xfs_repair on the copy >>> 5) mount the copy and see what you get >> >>> But if your drive simply cannot be read at all, this is not a filesystem >>> problem, it is a hardware problem. If this is critical data you may wish >>> to hire a data recovery service. >> >>> -Eric >> >> >> Hi Eric, >> >> Thanks for your message! >> I already started to copy the failing drive with ddrescue. This is a >> large drive, so it takes some time to complete... >> >> When I tried to run xfs_repair on the original (failing) drive, the >> xfs_repair was unable to read the superblock and then just quitted >> with an 'io error'. >> Do you think it can behave differently on a copied image ? > As I said, look at dmesg to see what failed on the original drive read > attempt. > ddrescue will fill unreadable sectors with 0, and then of course that > can be read from the image file. Ooops, I forgot to paste the error message from dmesg. Here it is: Jul 10 11:48:05 mgmt kernel: ata1.00: exception Emask 0x0 SAct 0x180000 SErr 0x0 action 0x0 Jul 10 11:48:05 mgmt kernel: ata1.00: irq_stat 0x40000008 Jul 10 11:48:05 mgmt kernel: ata1.00: failed command: READ FPDMA QUEUED Jul 10 11:48:05 mgmt kernel: ata1.00: cmd 60/00:98:28:ac:3e/01:00:03:00:00/40 tag 19 ncq 131072 in#012 res 41/40:00:08:ad:3e/00:00:03:00:00/40 Emask 0x409 (media error) <F> Jul 10 11:48:05 mgmt kernel: ata1.00: status: { DRDY ERR } Jul 10 11:48:05 mgmt kernel: ata1.00: error: { UNC } Jul 10 11:48:05 mgmt kernel: ata1.00: configured for UDMA/133 Jul 10 11:48:05 mgmt kernel: sd 0:0:0:0: [sda] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 10 11:48:05 mgmt kernel: sd 0:0:0:0: [sda] tag#19 Sense Key : Medium Error [current] [descriptor] Jul 10 11:48:05 mgmt kernel: sd 0:0:0:0: [sda] tag#19 Add. Sense: Unrecovered read error - auto reallocate failed Jul 10 11:48:05 mgmt kernel: sd 0:0:0:0: [sda] tag#19 CDB: Read(16) 88 00 00 00 00 00 03 3e ac 28 00 00 01 00 00 00 Jul 10 11:48:05 mgmt kernel: blk_update_request: I/O error, dev sda, sector 54439176 Jul 10 11:48:05 mgmt kernel: ata1: EH complete There are several of these. At the moment ddrescue reports 22 read errors (with 35% of the data copied to a new storage). If I remember correctly, the LVM with my root partition is at the end of the drive. This means more errors will likely come... :( The way I interpret the dmesg message, that's just a read error. I'm not sure, but maybe a complete wipe of the drive will even overwrite / clear these unreadable sectors. Well, that's something to be checked after the copy process finishes. --- Best regards, Andrey