On Sat, Jan 13 2018, RQM wrote: > Hello, > > I have been made aware that the link I had supplied previously does not work anymore. > Here's another attempt at uploading the `mdadm --dump /dev/sd[bcdef]3` output: > > https://filebin.net/i0olmgzg52obnp0f/dump.tgz > > Any help is greatly appreciated. Please do let me know whether you plan on working on this issue in the near future, because otherwise I will have to re-create a new array on these disks in order to put them into production again. > > Thank you so much! Sorry that is has taken me so long to get to this - January was a bit crazy. Short answer is that if you use --assemble --force-no-bbl it will really truly get rid of the bad block log. I really should add that to the man page. Longer answer: If you assemble the array (without force-no-bbl) and grep . /sys/block/md0/md/rd*/bad_blocks you'll get /sys/block/md0/md/rd2/bad_blocks:3196060416 8 /sys/block/md0/md/rd3/bad_blocks:3196060416 8 So that is a 4K block that is bad at the same location on 2 devices. There is no data offset, and the chunk size is 64K, so using bc: % bc 3196060416/(64*2) 24969222 3196060416%(64*2) 0 the blocks are at the start of stripe 24969222. Each stripe is 4 date chunks, and a chunk is 64K or 16 4K blocks. So the block offset is close to % bc 24969222*4*16 1598030208 which is exactly the "logical block" which was reported. There are 5 devices, so the parity block rotates through the pattern D0 D1 D2 D3 P D1 D2 D3 P D0 D2 D3 P D0 D1 D3 P D0 D1 D2 P D0 D1 D2 D3 % bc 24969222%5 2 So this should be row 2 (counting from 0) D2 D3 P D0 D1 rd2 and rd2 are bad, so that is 'P' and 'D0'. So this confirms that it is just the first 4K block of that stripe which is bad. Writing should fix it... but it doesn't. The write gets an IO error. Looking at the code I can see why. The fix isn't completely trivial. I'll have think about it carefully. But for now --update=force-no-bbl should get you going. NeilBrown
Attachment:
signature.asc
Description: PGP signature