Re: Troubleshooting "Buffer I/O error" on reading md device

NeilBrown <neilb@xxxxxxxx> · Fri, 02 Feb 2018 12:55:53 +1100

On Sat, Jan 13 2018, RQM wrote:

> Hello,
>
> I have been made aware that the link I had supplied previously does not work anymore.
> Here's another attempt at uploading the `mdadm --dump /dev/sd[bcdef]3` output:
>
> https://filebin.net/i0olmgzg52obnp0f/dump.tgz
> 
> Any help is greatly appreciated. Please do let me know whether you plan on working on this issue in the near future, because otherwise I will have to re-create a new array on these disks in order to put them into production again.
>
> Thank you so much!

Sorry that is has taken me so long to get to this - January was a bit
crazy.

Short answer is that if you use
  --assemble --force-no-bbl
it will really truly get rid of the bad block log.  I really should add
that to the man page.

Longer answer:
If you assemble the array (without force-no-bbl) and

  grep . /sys/block/md0/md/rd*/bad_blocks

you'll get

 /sys/block/md0/md/rd2/bad_blocks:3196060416 8
 /sys/block/md0/md/rd3/bad_blocks:3196060416 8

So that is a 4K block that is bad at the same location on 2 devices.
There is no data offset, and the chunk size is 64K, so using bc:

% bc
3196060416/(64*2)
24969222
3196060416%(64*2)
0

the blocks are at the start of stripe 24969222.
Each stripe is 4 date chunks, and a chunk is 64K or 16 4K blocks.
So the block offset is close to

% bc
24969222*4*16
1598030208

which is exactly the "logical block" which was reported.

There are 5 devices, so the parity block rotates through the pattern

D0 D1 D2 D3 P
D1 D2 D3 P  D0
D2 D3 P  D0 D1
D3 P  D0 D1 D2
P  D0 D1 D2 D3

% bc
24969222%5
2

So this should be row 2 (counting from 0)
D2 D3 P  D0 D1

rd2 and rd2 are bad, so that is 'P' and 'D0'.

So this confirms that it is just the first 4K block of that stripe which
is bad.
Writing should fix it... but it doesn't.  The write gets an IO error.

Looking at the code I can see why.  The fix isn't completely
trivial. I'll have think about it carefully.

But for now --update=force-no-bbl should get you going.

NeilBrown

Attachment:
signature.asc

Description: PGP signature