On 08/02/15 08:47, Roman Mamedov wrote:
Hello, I've got some bad sectors on one drive: dd: reading `/dev/sdh1': Input/output error 260200+0 records in 260200+0 records out 133222400 bytes (133 MB) copied, 2.97188 s, 44.8 MB/s [ 3908.350331] ata9.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x0 [ 3908.350385] ata9.00: irq_stat 0x40000008 [ 3908.350427] ata9.00: failed command: READ FPDMA QUEUED [ 3908.350474] ata9.00: cmd 60/06:90:6a:00:04/00:00:00:00:00/40 tag 18 ncq 3072 in [ 3908.350474] res 51/40:06:6a:00:04/00:00:00:00:00/40 Emask 0x409 (media error) <F> [ 3908.350628] ata9.00: status: { DRDY ERR } [ 3908.350669] ata9.00: error: { UNC } [ 3908.354643] ata9.00: configured for UDMA/133 [ 3908.354664] sd 8:0:0:0: [sdh] Unhandled sense code [ 3908.354668] sd 8:0:0:0: [sdh] [ 3908.354671] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 3908.354674] sd 8:0:0:0: [sdh] [ 3908.354677] Sense Key : Medium Error [current] [descriptor] [ 3908.354681] Descriptor sense data with sense descriptors (in hex): [ 3908.354683] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 [ 3908.354695] 00 04 00 6a [ 3908.354701] sd 8:0:0:0: [sdh] [ 3908.354705] Add. Sense: Unrecovered read error - auto reallocate failed [ 3908.354708] sd 8:0:0:0: [sdh] CDB: [ 3908.354710] Read(10): 28 00 00 04 00 6a 00 00 06 00 [ 3908.354721] end_request: I/O error, dev sdh, sector 262250 [ 3908.354773] Buffer I/O error on device sdh1, logical block 260202 [ 3908.354825] Buffer I/O error on device sdh1, logical block 260203 [ 3908.354891] Buffer I/O error on device sdh1, logical block 260204 [ 3908.354942] Buffer I/O error on device sdh1, logical block 260205 [ 3908.354992] Buffer I/O error on device sdh1, logical block 260206 [ 3908.355042] Buffer I/O error on device sdh1, logical block 260207 [ 3908.355125] ata9: EH complete Generally I believe these should go away when overwritten, but how do I overwrite them? The drive is an md RAID1 member: /dev/md4: Version : 1.2 Creation Time : Mon May 26 13:40:18 2014 Raid Level : raid1 Array Size : 1953379936 (1862.89 GiB 2000.26 GB) Used Dev Size : 1953379936 (1862.89 GiB 2000.26 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sun Feb 8 02:39:58 2015 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : natsu.romanrm.net:4 (local to host natsu.romanrm.net) UUID : 3b8c3166:073249b5:e1384bd6:4611df90 Events : 50426 Number Major Minor RaidDevice State 0 8 49 0 active sync /dev/sdd1 1 8 113 1 active sync /dev/sdh1 I thought I would run a 'check' or 'repair', this will read from both drives, fail to read from sdh, then try to overwrite the affected areas on sdh. But nope: # echo 0 > /sys/block/md4/md/sync_min # echo check > /sys/block/md4/md/sync_action [ 4059.451036] md: data-check of RAID array md4 [ 4059.451040] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 4059.451042] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. [ 4059.451046] md: using 128k window, over a total of 1953379936k. This happily proceeds through the supposedly unreadable area: md4 : active raid1 sdd1[0] sdh1[1] 1953379936 blocks super 1.2 [2/2] [UU] [>....................] check = 0.0% (1479680/1953379936) finish=1116.8min speed=29128K/sec bitmap: 2/8 pages [8KB], 131072KB chunk at 1.5GB already, while the unreadable sectors are at ~133MB. And no new ATA errors in dmesg. How is this possible? If I retry the 'dd' command right now, it fails exactly in the same way as before (and ATA errors do indeed appear).
Hi, I had a similar situation. In my case the bad sectors fell in an unused control area, part of the header, which is not read (or written) by the md normally or by the sync. The error did not show up during normal operation (or during scrub), only during the smartctl long test. What triggered the error for you? I looked up the size of the different parts of the RAID to arrive at that conclusion. Dumping the sectors around the bad area also showed it to be all zeroes. I ended up directly zeroing the bad sectors (hdparm --repair-sector ...). YMMV -- Eyal Lebedinsky (eyal@xxxxxxxxxxxxxx) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html