Re: Troubleshooting "Buffer I/O error" on reading md device

RQM <rqm@xxxxxxxxxxxxxx> · Fri, 05 Jan 2018 07:55:14 -0500

Hi,

here's the metadata dump:
https://filebin.ca/3n9OgaeSlV6x/dump.tgz

When I try assembling with no-bbl, this is what I get:

# mdadm -A /dev/md0 --update=no-bbl /dev/sd[bcdef]3
mdadm: Cannot remove active bbl from /dev/sdc3
mdadm: Cannot remove active bbl from /dev/sde3
mdadm: /dev/md0 has been started with 5 drives.

The array does start up, but the behavior regarding dd reads and writes remains as it was before:
Failure to read with the corresponding error messages in dmesg and on stdout/stderr,
failure to write, but only indicated in dmesg.

By the way, I have run smart long tests a day or two ago, and it reportedly completed without errors on all involved disks.

Thank you again so much for your help!

>-------- Original Message --------
>Subject: Re: Troubleshooting "Buffer I/O error" on reading md device
>Local Time: January 5, 2018 2:05 AM
>UTC Time: January 5, 2018 1:05 AM
>From: neilb@xxxxxxxx
>To: RQM <rqm@xxxxxxxxxxxxxx>
>linux-raid\\\\\\\@vger.kernel.org <linux-raid@xxxxxxxxxxxxxxx>
>
>On Thu, Jan 04 2018, RQM wrote:
>
>>Hello,
>>>I needed "mdadm -E" the components of the array, so the partitions
>>> rather than the whole devices. e.g. /dev/sdb1, not /dev/sdb.
>>>Sorry, that should have occurred to me. Here's the output:
>>https://paste.ubuntu.com/26319689/
>>Indeed, bad blocks are present on two devices.
>>>You can remove the bad block by over-writing it.
>>> dd if=/dev/zero of=/dev/md0 bs=4K seek=1598030208 count=1
>>> though that might corrupt some file containing the block.
>>>I have tried that just now, but before running mdadm -E above. dd appears to succeed when writing to the bad block, but after that, reading that block with dd fails again:
>> "dd: error reading '/dev/md0': Input/output error"
>>In dmesg, the following errors appear:
>> [220444.068715] VFS: Dirty inode writeback failed for block device md0 (err=-5).
>> [220445.850229] Buffer I/O error on dev md0, logical block 1598030208, async page read
>>I have repeated the dd write-then-read experiment, with identical results.
>>The filesystem is indeed ext4, but it's not of tremendous importance to me that all data is recovered, as the array contains backup data only. However, I would like to get the backup system back into operation, so I'd be very grateful for further hints how to get the array into a usable state.
>>
> The easiest approach is to remove the bad block log.
> Stop array, and then assemble with --update=no-bbl.
> e.g
> mdadm -S /dev/md0
> mdadm -A /dev/md0 --update=no-bbl /dev/sd[bcdef]3
>
> Before you do that though, please take a dump of the metadata and send
> it to me, in case I get motivated to figure out why writing didn't work.
>
> mkdir /tmp/dump
> mdadm --dump /tmp/dump /dev/sd[bcdef]3
> tar czSf /tmp/dump.tgz /tmp/dump
>
> The files in /tmp are sparse images of the hard drives with only
> the metadata present.  The 'S' flag to tar should cause it to notice
> this and create a tiny tgz file.
> Then send me /tmp/dump.tgz.
>
> Thanks,
> NeilBrown
>
>
>>Thank you so much for your help so far!
>>
>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html