raid1 means that the two disks hold the same data at the same offset, so dd from the member device should confirm the address and trigger an error. Your calculation looks correct to me (IANA Expert). I do not understand why you "can't run badblocks" on the individual device. Only running it on the md device causes it to be "cover"ed. BTW, I would expect the periodic smartd long test to report the bad blocks earlier, and the periodic md 'check' to fix such problems. HTH, Eyal On 14/03/14 08:23, Thorsten von Eicken wrote:
I've just had a disk in a raid1 mirror set die and that exposed some bad block on the remaining drive -- ooops! I'd now like to map the bad blocks to files so I can restore the affected files from backups, but I can't figure out the mapping. What I've done: - while I had only one drive, I ran badblocks on the md device, that gave me a list of bad blocks, I verified that they were indeed bad with dd, and I used debugfs to map them to files, verified the files gave a read error, wrote zeroes to the blocks to get the drive to reallocate, done! - I rebuilt the mirror set with a fresh drive, but that exposed two more bad blocks, ouch! - Now I can't run badblocks anymore easily because the second drive will "cover" for the first one as far as I understand - I've tried converting sectors to blocks, subtracting partition offset and data offset, but it's just not working, i.e. I can't get dd to hit the error when I try This is the set of bad blocks I'm trying to deal with: Mar 13 01:12:24 h kernel: [522839.210723] end_request: I/O error, dev sda, sector 1147023664 Mar 13 01:12:24 h kernel: [522839.211618] ata1: EH complete Mar 13 01:12:24 h kernel: [522839.211635] md/raid1:md0: sda: unrecoverable I/O read error for block 1146085632 Mar 13 01:11:49 h kernel: [522804.763467] end_request: I/O error, dev sda, sector 1147020360 Mar 13 01:11:49 h kernel: [522804.765146] ata1: EH complete Mar 13 01:11:49 h kernel: [522804.765180] md/raid1:md0: sda: unrecoverable I/O read error for block 1146082304 Mar 13 01:11:30 h kernel: [522785.944248] end_request: I/O error, dev sda, sector 1147017056 Mar 13 01:11:30 h kernel: [522785.945926] ata1: EH complete Mar 13 01:11:30 h kernel: [522785.945961] md/raid1:md0: sda: unrecoverable I/O read error for block 1146078976 Mar 13 01:09:23 h kernel: [522658.983066] end_request: I/O error, dev sda, sector 1129050144 Mar 13 01:09:23 h kernel: [522658.984750] ata1: EH complete Mar 13 01:09:23 h kernel: [522658.984781] md/raid1:md0: sda: unrecoverable I/O read error for block 1128112128 Mar 13 01:06:44 h kernel: [522499.760134] end_request: I/O error, dev sda, sector 1098724456 Mar 13 01:06:44 h kernel: [522499.761829] ata1: EH complete Mar 13 01:06:44 h kernel: [522499.761869] md/raid1:md0: sda: unrecoverable I/O read error for block 1097786368 The GPT is: Disk /dev/sda: 3907029168 sectors, 1.8 TiB Logical sector size: 512 bytes Disk identifier (GUID): FD188278-C58A-41C7-9943-AD5E94EDF1F8 Partition table holds up to 128 entries First usable sector is 34, last usable sector is 3907029134 Partitions will be aligned on 2048-sector boundaries Total free space is 4061 sectors (2.0 MiB) Number Start (sector) End (sector) Size Code Name 1 2048 409600 199.0 MiB EF00 EFI System 2 411648 935935 256.0 MiB 0700 Microsoft basic data 3 935936 3907029134 1.8 TiB FD00 Linux RAID The md device says: # mdadm --examine /dev/sda3 /dev/sda3: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 1f3118f3:ee5a644b:d8d3df40:8decaa30 Name : h2:0 Creation Time : Sun Mar 18 00:37:55 2012 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 3906091151 (1862.57 GiB 1999.92 GB) Array Size : 1953045575 (1862.57 GiB 1999.92 GB) Used Dev Size : 3906091150 (1862.57 GiB 1999.92 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 7a87b471:48f6c853:0a5dd5ad:15fa79a4 Update Time : Thu Mar 13 19:43:14 2014 Checksum : 1522576c - correct Events : 8076588 Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing) - I'm using an ext4 filesystem and dumpfs says: First block: 0 Block size: 4096 If I take the bad sector number from the first error message (1147023664) subtract the partition start (935936) and then divide by 8 (4096 byte blocks) I can get dd to trigger the bad block: # dd if=/dev/sda3 of=/dev/null bs=4096 count=10 iflag=direct skip=143260966 dd: reading `/dev/sda3': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied, 15.7246 s, 0.0 kB/s But I need the block number in the md device so I can map it back to a file. I've tried various calculations and used dd on the md device but I see no error in syslog indicating hitting a bad block. Does someone know how to do the mapping and/or how to fix the situation? Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
-- Eyal Lebedinsky (eyal@xxxxxxxxxxxxxx) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html