On Sun Sep 16, 2012 at 12:06:48 +0200, Niccolò Belli wrote: > Il 15/09/2012 21:41, Robin Hill ha scritto: > > If md hasn't failed the drive then either: > > - md didn't get a read error > > - md got a success message when re-writing the block > > - there's a bug in md and it's not handled the error at all > > It seems it's case one, while manually verifying the checksums with > > for i in $(seq 50); do dd if=/dev/sda1 of=sda${i} bs=100000 count=50 > skip=$((($i-1)*50+10)) > /dev/null 2> /dev/null; dd if=/dev/sdb1 > of=sdb${i} bs=100000 count=50 skip=$((($i-1)*50+10)) > /dev/null 2> > /dev/null; md5sum sda${i}; md5sum sdb${i}; echo; done > > I get this in syslog: > > Sep 15 23:50:09 asterisk kernel: [273828.407914] scsi_verify_blk_ioctl: > 30 callbacks suppressed > Sep 15 23:50:09 asterisk kernel: [273828.407920] dd: sending ioctl > 80306d02 to a partition! > Sep 15 23:50:09 asterisk kernel: [273828.407925] dd: sending ioctl > 80306d02 to a partition! > Sep 15 23:50:10 asterisk kernel: [273829.422247] ata3.00: exception > Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 > Sep 15 23:50:10 asterisk kernel: [273829.424071] ata3.00: BMDMA stat 0x44 > Sep 15 23:50:10 asterisk kernel: [273829.425855] ata3.00: failed > command: READ DMA > Sep 15 23:50:10 asterisk kernel: [273829.427625] ata3.00: cmd > c8/00:00:68:17:00/00:00:00:00:00/e0 tag 0 dma 131072 in > Sep 15 23:50:10 asterisk kernel: [273829.427627] res > 51/40:00:90:17:00/40:00:00:00:00/e0 Emask 0x9 (media error) > Sep 15 23:50:10 asterisk kernel: [273829.431184] ata3.00: status: { DRDY > ERR } > Sep 15 23:50:10 asterisk kernel: [273829.432992] ata3.00: error: { UNC } > Sep 15 23:50:11 asterisk kernel: [273830.404203] ata3.00: configured for > UDMA/133 > Sep 15 23:50:11 asterisk kernel: [273830.404217] ata3: EH complete > > > > but this is the output of the command: > > > b7d4e3c3bb461a1aa6619c22ef11d072 sda1 > b7d4e3c3bb461a1aa6619c22ef11d072 sdb1 > <- snip sets of identical checksums -> > > 94f883b45084b72cd9269a4821b2d509 sda50 > 94f883b45084b72cd9269a4821b2d509 sdb50 > Okay, so it looks like the drive is managing to return the correct data eventually (or it's returning some default value which has also been written to the other mirror now). > *BUT* if I start reading from the start of partition (+0 instead of +10 > in count=) I get a mismatch, on both md0 and md1 (which is supposed to > be ok)!!! > > root@asterisk:~# i=1; dd if=/dev/sda1 of=sda${i} bs=100000 count=50 > skip=$((($i-1)*50+0)) > /dev/null 2> /dev/null; dd if=/dev/sdb1 > of=sdb${i} bs=100000 count=50 skip=$((($i-1)*50+0)) > /dev/null 2> > /dev/null; md5sum sda${i}; md5sum sdb${i} > 9f9f11ffeb0aed0abc8097417b293f41 sda1 > 394efde218ad700774bfcb3c43255529 sdb1 > root@asterisk:~# i=1; dd if=/dev/sda2 of=sda${i} bs=100000 count=50 > skip=$((($i-1)*50+0)) > /dev/null 2> /dev/null; dd if=/dev/sdb2 > of=sdb${i} bs=100000 count=50 skip=$((($i-1)*50+0)) > /dev/null 2> > /dev/null; md5sum sda${i}; md5sum sdb${i} > 8cb0b6fa2bf7f0f88a2a2a91598429d4 sda1 > 732c42e14b8e78930d08cdb4f1c49a40 sdb1 > > Shouldn't raid1 match even at the very beginning of the partition? > No, the start of the partition will contain the md superblock (for 1.1 and 1.2 metadata formats), which will be slightly different for the two devices. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@xxxxxxxxxxxxxxx> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" |
Attachment:
pgplhzy9qrzCC.pgp
Description: PGP signature