On Thu, Nov 12 2015, matt@xxxxxxxxxxxxxxxxxxx wrote: > Hello, > > I posted a while back about getting buffer i/o errors in my dmesg logs > to my raid array, something along the lines of this: > > [158219.456484] EXT4-fs warning (device md4): ext4_end_bio:329: I/O > error -5 writing to inode 125274714 (offset 176160768 size 8388608 > starting block 4955235712) > [158219.456487] Buffer I/O error on device md4, logical block 4955235584 > [158219.456490] Buffer I/O error on device md4, logical block 4955235585 > [158219.456491] Buffer I/O error on device md4, logical block 4955235586 > [158219.456491] Buffer I/O error on device md4, logical block 4955235587 > [158219.456492] Buffer I/O error on device md4, logical block 4955235588 > [158219.456493] Buffer I/O error on device md4, logical block 4955235589 > [158219.456494] Buffer I/O error on device md4, logical block 4955235590 > [158219.456495] Buffer I/O error on device md4, logical block 4955235591 > [158219.456496] Buffer I/O error on device md4, logical block 4955235592 > [158219.456497] Buffer I/O error on device md4, logical block 4955235593 > [158219.456580] EXT4-fs warning (device md4): ext4_end_bio:329: I/O > error -5 writing to inode 125274714 (offset 176160768 size 8388608 > starting block 4955235456) > [158219.456663] EXT4-fs warning (device md4): ext4_end_bio:329: I/O > error -5 writing to inode 125274714 (offset 176160768 size 8388608 > starting block 4955235200) > [158219.456747] EXT4-fs warning (device md4): ext4_end_bio:329: I/O > error -5 writing to inode 125274714 (offset 176160768 size 8388608 > starting block 4955234944) > [158219.456829] EXT4-fs warning (device md4): ext4_end_bio:329: I/O > error -5 writing to inode 125274714 (offset 176160768 size 8388608 > starting block 4955234688) > [158219.456912] EXT4-fs warning (device md4): ext4_end_bio:329: I/O > error -5 writing to inode 125274714 (offset 176160768 size 8388608 > starting block 4955234432) > [158469.158278] EXT4-fs warning (device md4): ext4_end_bio:329: I/O > error -5 writing to inode 123995503 (offset 0 size 8388608 starting > block 4970080384) > [158469.158281] buffer_io_error: 1526 callbacks suppressed > > I am now using the latest mainline kernel, 4.3.0 and I believe something > is going wrong with the badblocks implementation. > > I originally had 3 drives, all with the same badblocks list. This array > has been running a while so I have no idea how these 3 discs all ended > up with the same list of badblocks. > > Now, if I remove any drive, which has no badblock entries, and re-add > it. Once the sync is complete I end up with another drive with the same > badblocks list. An entry in the bad-blocks list means that the data at that location is not available, possibly because the block is bad. If you have a degraded RAID6 where any appears in 2 or more bad-blocks lists, then it is not possible to recover the data at that address when a spare is recovered. So the same address will be added to the bad block log on the spare. You could remove he bad block from all the device by writing to all of the affected blocks at once, but that is admittedly a little difficult to manage. I probably need to make it possible to clear the bad block log by a successful write to just a single data block (and the matching parity blocks). I've added that to by to-do list. I've just push out a modification to mdadm so you can run mdadm --assemble --update=force-no-bbl /dev/md/whatver list of devices and it will remove the bad-block lists even though they are not empty. So if you git clone git://neil.brown.name/mdadm cd mdadm make ./mdadm --stop /dev/md4 ./mdadm --assemble /dev/md4 --update=force-no-bblk list-of-devices it should get rid of your problem. However, as your mail is 6 weeks old (I was on leave...) maybe you have already found another solution. NeilBrown
Attachment:
signature.asc
Description: PGP signature