Re: badblocks seem to be causing problems with raid6 - badblocks list replicating all all drives

NeilBrown <neilb@xxxxxxxx> · Mon, 21 Dec 2015 15:00:16 +1100

On Thu, Nov 12 2015, matt@xxxxxxxxxxxxxxxxxxx wrote:

> Hello,
>
> I posted a while back about getting buffer i/o errors in my dmesg logs 
> to my raid array, something along the lines of this:
>
> [158219.456484] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 125274714 (offset 176160768 size 8388608 
> starting block 4955235712)
> [158219.456487] Buffer I/O error on device md4, logical block 4955235584
> [158219.456490] Buffer I/O error on device md4, logical block 4955235585
> [158219.456491] Buffer I/O error on device md4, logical block 4955235586
> [158219.456491] Buffer I/O error on device md4, logical block 4955235587
> [158219.456492] Buffer I/O error on device md4, logical block 4955235588
> [158219.456493] Buffer I/O error on device md4, logical block 4955235589
> [158219.456494] Buffer I/O error on device md4, logical block 4955235590
> [158219.456495] Buffer I/O error on device md4, logical block 4955235591
> [158219.456496] Buffer I/O error on device md4, logical block 4955235592
> [158219.456497] Buffer I/O error on device md4, logical block 4955235593
> [158219.456580] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 125274714 (offset 176160768 size 8388608 
> starting block 4955235456)
> [158219.456663] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 125274714 (offset 176160768 size 8388608 
> starting block 4955235200)
> [158219.456747] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 125274714 (offset 176160768 size 8388608 
> starting block 4955234944)
> [158219.456829] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 125274714 (offset 176160768 size 8388608 
> starting block 4955234688)
> [158219.456912] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 125274714 (offset 176160768 size 8388608 
> starting block 4955234432)
> [158469.158278] EXT4-fs warning (device md4): ext4_end_bio:329: I/O 
> error -5 writing to inode 123995503 (offset 0 size 8388608 starting 
> block 4970080384)
> [158469.158281] buffer_io_error: 1526 callbacks suppressed
>
> I am now using the latest mainline kernel, 4.3.0 and I believe something 
> is going wrong with the badblocks implementation.
>
> I originally had 3 drives, all with the same badblocks list.  This array 
> has been running a while so I have no idea how these 3 discs all ended 
> up with the same list of badblocks.
>
> Now, if I remove any drive, which has no badblock entries, and re-add 
> it.  Once the sync is complete I end up with another drive with the same 
> badblocks list.

An entry in the bad-blocks list means that the data at that location is
not available, possibly because the block is bad.

If you have a degraded RAID6 where any appears in 2 or more bad-blocks
lists, then it is not possible to recover the data at that address when
a spare is recovered.  So the same address will be added to the bad
block log on the spare.

You could remove he bad block from all the device by writing to all of
the affected blocks at once, but that is admittedly a little difficult
to manage.

I probably need to make it possible to clear the bad block log by a
successful write to just a single data block (and the matching parity
blocks).  I've added that to by to-do list.

I've just push out a modification to mdadm so you can run
  mdadm --assemble --update=force-no-bbl /dev/md/whatver list of devices

and it will remove the bad-block lists even though they are not empty.
So if you

 git clone git://neil.brown.name/mdadm
 cd mdadm
 make
 ./mdadm --stop /dev/md4
 ./mdadm --assemble /dev/md4 --update=force-no-bblk list-of-devices

it should get rid of your problem.
However, as your mail is 6 weeks old (I was on leave...) maybe you have
already found another solution.

NeilBrown
Attachment:
signature.asc

Description: PGP signature