On Wed, 02 Jul 2014 12:54:34 +0100 Pedro Teixeira <finas@xxxxxxxx> wrote: > cpu is a phenom x6, 8gb ram. controller is LSI 9201-i16. hdd's are > seagate sshd ST1000DX001. > > So I run the "dd if=/dev/md0 of=/dev/null bs=4096" and it failed on > alot of places. I had to restart the command several times with the > skip parameter set to a couple of blocks after the last block error. > It run for about 1.5TB of the total 13TB of the volume. > The md volume didn't drop any drive when running this. > > dmesg showed: > > [ 1678.478156] Buffer I/O error on device md0, logical block 196012546 I love numbers, thanks. The logical block size is 4096, or 8 sectors (1 sector is defined as 512 bytes), so this is at 196012546*8 == 1568100368 sectors into the array. The array has a chunksize of 512K, or 1024 sectors so 196012546*8/1024 = 1531348.015625 gives us the chunk number, and the remaining fraction of a chunk. The RAID6 has 16 devices, so there are 14 data chunks in each stripe, so to find where the above chunk is stored we divide by 14 1531348/14 = 109382.0000 So that is chunk 109382 on the first device (though with rotating data, it might not be the very first). Add back in the factional part, multiple by 1024 sectors per chunk, and add the Data Offset, 109382.01562500*1024+262144 = 112269328 So it seems that sector 112269328 on some device is bad. > The command "mdadm --examine-badblocks /dev/sd[bcdefghijklmnopqr] >> > raid.b" before and after running the "dd" command returned no changes: > I didn't notice the fact that the bad block logs were not empty before, sorry. Anyway:... > > Bad-blocks on /dev/sdb: > 112269328 for 512 sectors Look at that - exactly the number I calculated. I love it when that works out. So the problem is exactly that some blocks are thought by md to be bad. Blocks get recorded as bad (for raid6) when: - a 'read' reported an error which could not be fixed, either because the array was degraded so the data could not be recovered, or because the attempt to write restored data failed - when recovering a spare, if the data to be written cannot be found (due to errors on other devices) - when a 'write' request to a device fails When your array had three failed devices, some reads and writes would have failed. Maybe that caused the bad blocks to be recorded. What sort of devices failures where they? If the device became completely inaccessible, then it would not have been possible to record the bad block information. Can you describe the sequence of events that lead to the three failures? When you put the array back together, did you --create it, or --assemble --force? There isn't an easy way to remove the bad block list, as doing so is normally asking for data corruption. However it is probably justified in your case. As it happens I included code in the kernel to make it possible to remove bad blocks from the list - it was intended for testing only but I never removed it. If you run sed 's/^/-/' /sys/block/md0/md/dev-sdq/bad_blocks | while read; do echo $a > /sys/block/md0/md/dev-sdq/bad_blocks done then it should clear all of the bad blocks recorded on sdq. You should probably fail/remove the last two devices that you added to the array before you do this, as they probably don't have properly uptodate information and doing this will cause corruption. I probably need to think about better ways to handle the bad block lists. NeilBrown
Attachment:
signature.asc
Description: PGP signature