Hi, I am capturing this thread, because I also stumbled over the same problem, except I am running a RAID-1 setup. The server is (still) running Debian/stretch with mdadm 3.4-4+b1. Basically this is what happens: Accessing the RAID fails: % sudo dd if=/dev/md0 of=/dev/null skip=3112437760 count=33554432 dd: error reading '/dev/md0': Input/output error 514936+0 records in 514936+0 records out 263647232 bytes (264 MB, 251 MiB) copied, 0.447983 s, 589 MB/s dmesg output while trying to access the RAID: [Tue Nov 1 22:09:59 2022] Buffer I/O error on dev md0, logical block 389119087, async page read [Tue Nov 1 22:22:01 2022] Buffer I/O error on dev md0, logical block 389119087, async page read Jumping to the 'logical block': % sudo blockdev --getbsz /dev/md0 4096 % sudo dd if=/dev/md0 of=/dev/null skip=389119087 bs=4096 count=33554432 dd: error reading '/dev/md0': Input/output error 0+0 records in 0+0 records out 0 bytes copied, 0.000129958 s, 0.0 kB/s But the underlying disk seemed ok, which was strange: % sudo dd if=/dev/sdb1 skip=3112437760 count=33554432 of=/dev/null 33554432+0 records in 33554432+0 records out 17179869184 bytes (17 GB, 16 GiB) copied, 112.802 s, 152 MB/s sudo dd if=/dev/sdb1 skip=3112437760 count=33554432 of=/dev/null 9.18s user 29.80s system 34% cpu 1:52.81 total Note, through trial + error I found the offset of /dev/md0 to /dev/sdb1 to be 262144 blocks (with block size 512). That's why skip is not the same for both commands. After a very long research I found this thread and yes, there is a bad block log: % cat /sys/block/md0/md/rd*/bad_blocks 3113214840 8 % sudo mdadm -E /dev/sdb1 | grep Bad Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present. The other disk of that RAID has been removed, because the disk had SMART errors and is about to be replaced. Only then I noticed the input/output error. I am not sure how to proceed from here. Do you have any advice? On 2018-02-02 02:55, NeilBrown wrote: > > Short answer is that if you use > --assemble --force-no-bbl > it will really truly get rid of the bad block log. I really should add > that to the man page. *friendly wave* > Longer answer: > If you assemble the array (without force-no-bbl) and > > [...] > > So this should be row 2 (counting from 0) > D2 D3 P D0 D1 > > rd2 and rd2 are bad, so that is 'P' and 'D0'. > > So this confirms that it is just the first 4K block of that stripe which > is bad. > Writing should fix it... but it doesn't. The write gets an IO error. > > Looking at the code I can see why. The fix isn't completely > trivial. I'll have think about it carefully. I am curious: did you come up with a solution? Best & thx for your help, - Darsha P.s. I am not subscribed, please put me on CC.
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature