Re: Rebuilding an array with a corrupt disk.

David Greaves <david@xxxxxxxxxxxx> · Sat, 14 Jun 2008 12:47:27 +0100

Sean Hildebrand wrote:
> How's that?
> 
> The spare (/dev/sdd) seems to be fine. I haven't tried the rebuild
> with any other disks, but smartctl doesn't report any issues with
> /dev/sdd, only /dev/sda.
Sorry, misread what you said.
Thought you had errors on both sda and sdd.

> Ran ddrescue, managed to recover 559071 MB.. But the other 191GB was
> thousand  upon thousands of read errors.
Looking fairly bad then.

> Now, prior to this with the array in degraded mode I was able to
> access and modify all files I found, but mdadm would always fail on
> rebuild, and fsck would always fail and the array would go down
> roughly 75% through the scan, presumably when first encountering bad
> sections of the disk.
Sounds reasonable.

> ddrescue has not yet finished - It's currently "Splitting error
> areas..." - Given that the array has been mountable prior to running
> ddrescue, is it safe to assume that once it's done, the
> partially-cloned /dev/sda1 that ddrescue has output onto /dev/sdd1
> will be mountable as part of the array so I can assess file loss?
It should be.
Additionally, the raid won't die as fsck works.

However if any of the other disks die then you will have problems.
Its safer to add the spare when it arrives and go to a redundant setup. Then, if
any one drive dies, fsck will continue.

Also note that you *may* recover more data by using ddrescue with a logfile and
re-running it after chilling the failed drive etc. Google...

The longer you persevere with ddrescue, the more data you have the chance of
recovering. Maybe keep at it until the replacement spare arrives. Again - read
up on ddrescue - the list archives had something in the last few weeks.

> I am unsure of how data is spread through a RAID5. Each disk gets an
> equal portion of data, but do drives fill up in linear fashion?
No. the data is spread amongs the drives. You've lost everything from the 75% up
mark on all the drives.

> I ask
> this because whether the array is being rebuilt or fscked it fails at
> roughly 75% through either operation, yet I never had the array go
> down while I was using it - Only when fsck was running or mdadm was
> rebuilding.

> The array is 2.69 TB, with 1.57TB currently free - If the drives do
> fill linearly (Or even semi-linearly) is it likely that the majority
> of the 191GB of errors are empty space?
I don't know how various filesystems use space.
It also depends on previous usage - was the disk ever more full? etc etc.
I do know that with 'normal' filesystems (ext/xfs/etc) then the answer is undefined.
Plus it's 191Gb x4 - so ~800Gb of corrupted md device.

Sorry - keep fingers crossed.

> If this isn't making much sense I apologize. I'm sleep deprived and
> not enjoying the prospect of losing large quantities of my data.
Sad, but people do use RAID instead of backups.
RAID is a convenience that helps with uptime in the event of a failure and
reduces the risk of data-loss between backups.

Lets see what can be done to get it all back though - you may be lucky.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html