possible bug in md

Iordan Iordanov <iordan@xxxxxxxxxxxxxxx> · Mon, 04 Jul 2011 12:26:14 -0400

Hi,

I was doing some testing with an Ubuntu 10.04 installation (Linux 
2.6.32, so my apologies if this has been noted and dealt with already), 
and I noticed what I think may be a bug.

I had a system with RAID10, layout n2, where /dev/sda is one of the 
devices, and the other is "missing". I wanted to add /dev/sdb to the 
RAID10 array. Both drives are on their last legs (bad sectors and 
stuff), and I was just doing a proof of concept for a guide I was 
writing, so I didn't care.

Here are the relevant dmesg messages for the drives detected:
====================================================
ata1.00: ATA-5: IC35L040AVER07-0, ER4OA44A, max UDMA/100
ata1.00: 80418240 sectors, multi 16: LBA
ata1.01: ATA-6: Maxtor 94610H6, BAC51KJ0, max UDMA/100
ata1.01: 90045648 sectors, multi 16: LBA
====================================================

On the system, ata1.00 is an IBM drive (/dev/sda), and ata1.01 is a 
Maxtor drive (/dev/sdb). I have RAID10 (/dev/md0) on ata1.00 (/dev/sda) 
and one "missing" device. I added the Maxtor (ata1.01, /dev/sdb), and 
during the sync, an error occurred on ata1.00, which is the first disk 
of the RAID10 array (the IBM, /dev/sda). However, mdadm wrongly reports 
that an error has occurred on the device I had just ADDED (the Maxtor):

====================================================
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x65
ata1.00: failed command: READ DMA
ata1.00: cmd c8/00:00:00:e5:7b/00:00:00:00:00/e2 tag 0 dma 131072 in
         res 51/40:39:c7:e5:7b/00:00:00:00:00/e2 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/100
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x65
ata1.00: failed command: READ DMA
ata1.00: cmd c8/00:00:00:e5:7b/00:00:00:00:00/e2 tag 0 dma 131072 in
         res 51/40:39:c7:e5:7b/00:00:00:00:00/e2 Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/100
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
        02 7b e5 c7
sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate 
failed
sd 0:0:0:0: [sda] CDB: Read(10): 28 00 02 7b e5 00 00 01 00 00
end_request: I/O error, dev sda, sector 41674183
ata1: EH complete
md: md0: recovery done.
raid10: Disk failure on sdb, disabling device.
raid10: Operation continuing on 1 devices.
RAID10 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sda
 disk 1, wo:1, o:0, dev:sdb
RAID10 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:sda
====================================================

The relevant lines are the ones that show the errors on ata1.00 (the 
IBM), and then the line which reports disk failure on /dev/sdb (ata1.01):

raid10: Disk failure on sdb, disabling device.

Sincerely,
Iordan Iordanov
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html