Re: What just happened to my disks/RAID5 array?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello again Phil (and of course alco possible bystanders :))!

On 09/14/2011 01:41 PM, Phil Turmel wrote:
> Good Morning Johannes,
> 
> Sorry about the delay...  worked late yesterday.

Really no need to be sorry about anything; actually I'm perfectly aware
that I'm not entitled to any kind of your support, and I greatly
appreciate it whenever you volunteer to share your insights with me. So
let me say thank you very, very much for getting back to me again in
this regard!

>> The controller seems alive still - lsdrv (output attached) lists 
>> the kernel still having registered some of the component devices.
> 
> Actually, it doesn't.  None of the /dev/md0 components are present. 
> Ditto for the "mdadm -D" report.

You are right; none of the disks were present once I got to the machine.
The lvm and fs on top seemed rather confused about what happened, and I
went on to kill all processes with file handles open on the fs in
question, unmounted the fs, and rebooted. The board's BIOS took an
awkwardly long time when scanning for SATA devices on the SB's ports,
but in the end showed all of them in the POST screen. After booting the
kernel, one of the drives popped out rather early in the process (about
two or three seconds after the kernel picked it up), and all subsequent
reboots (even when disconnecting the failed and/or all but one drive(s))
make the box hang indefinitely upon POSTing and scanning the SATA
controller. My guess is that the board/controller is fried.

> [...] "--assemble" is safe in all known cases.  Use it first.  With 
> the whole controller gone, you probably have consistent event counts 
> after all, and --assemble should just work.  "--assemble --force" is 
> somewhat less safe, but I wouldn't hesitate to use it in a situation 
> where the drives truly dropped out together.  You'll likely find some
> problems with fsck if files were actively being written when the 
> array dropped out, but the vast majority of your filesystem(s) should
> be safe.

Thanks, I will try that as soon as I can get my hands onto a machine
with enough free SATA ports - I might have to replace the whole system
(at least board, CPU and RAM) and will have to do some research before
settling for specific hardware. I can do without that part of my data
for a few days, probably even weeks, but losing it forever would be hard
to swallow still.

> Other procedures are progressively less safe.  I prefer to not offer 
> specifics until you've hooked your drives back up, and generated 
> fresh "lsdrv" and "mdadm" reports.

I promise I'll get back to the list if --assemble doesn't do its deed
right away once I got a system put together that can handle all the
array's member devices.

Again, thank you very much for your time and sharing your expertise!

-- 
with best regards:
- Johannes Truschnigg ( johannes@xxxxxxxxxxxxxxx )

www: http://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp: johannes@xxxxxxxxxxxxxxx

Please do not bother me with HTML-eMail or attachments. Thank you.

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux