On Sun, Dec 27, 2009 at 2:47 PM, Leslie Rhorer <lrhorer@xxxxxxxxxxx> wrote: >> On Sun, 2009-12-27 at 00:13 -0600, Leslie Rhorer wrote: >> > > # mdadm --examine /dev/sdb1 >> > > mdadm: No md superblock detected on /dev/sdb1. >> > > >> > > (Does this mean that sdb1 is bad? or is that OK?) >> > >> > It doesn't necessarily mean the drive is bad, but the superblock is >> > gone. Are you having mdadm monitor your array(s) and send informational >> > messages to you upon RAID events? If not, then what may have happened >> is >> > you lost the superblock on sdb1 and at some other time - before or after >> - >> > lost the sda drive. Once both events had taken place, your array is >> toast. >> Right, I need to set up monitoring... > > Um, yeah. A RAID array won't prevent drives from going up in smoke, > and if you don't know a drive has failed, you won't know you need to fix > something - until a second drive fails. > >> > All may not be lost, however. First of all, take care when >> > re-arranging not to lose track of which drive was which at the outset. >> In >> > fact, other than the sda drive, you might be best served not to move >> > anything. Take special care if the system re-assigns drive letters, as >> it >> > can easily do. >> So should I just "move" the A drive? and try to fire it back up? > > At this point, yeah. Don't lose track of from where and to where it > has been moved, though. > >> > What are the contents of /etc/mdadm.conf? >> > >> >> mdadm.conf contains this: >> ARRAY /dev/md0 level=raid10 num-devices=4 >> UUID=3d93e545:c8d5baec:24e6b15c:676eb40f > > Yeah, that doesn't help much. > >> So, by re-creating, do you mean I should try to run the "mdadm --create" >> command again the same way I did back when I created the array >> originally? Will that wipe out my data? > > Not in and of itself, no. If you get the drive order wrong > (different than when it was first created) and resync or write to the array, > then it will munge the data, but all creating the array does is create the > superblocks. > > >> # smartctl -l selftest /dev/sda >> smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen >> Home page is http://smartmontools.sourceforge.net/ >> >> Standard Inquiry (36 bytes) failed [No such device] >> Retrying with a 64 byte Standard Inquiry >> Standard Inquiry (64 bytes) failed [No such device] >> A mandatory SMART command failed: exiting. To continue, add one or more '- >> T permissive' options. > > Well, we kind of knew that. Either the drive is dead, or there is a > hardware problem in the controller path. Hope for the latter, although a > drive with a frozen platter can sometimes be resurrected, and if the drive > electronics are bad but the servo assemblies are OK, replacing the > electronics is not difficult. Otherwise, it's a goner. > >> # smartctl -l selftest /dev/sdb >> smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen >> Home page is http://smartmontools.sourceforge.net/ >> >> === START OF READ SMART DATA SECTION === >> SMART Self-test log structure revision number 1 >> Num Test_Description Status Remaining >> LifeTime(hours) LBA_of_first_error >> # 1 Extended offline Completed: read failure 90% 7963 >> 543357 > > Oooh! That's bad. Really bad. Your earlier post showed the > superblock is a 0.90 version. The 0.90 superblock is stored near the end of > the partition. Your drive is suffering a heart attack when it gets near the > end of the drive. If you can't get your sda drive working again, then I'm > afraid you've lost some data, maybe all of it. Trying to rebuild a > partition from scratch when part of it is corrupted is not for the feint of > heart. If you are lucky, you might be able to dd part of the sdb drive onto > a healthy one and manually restore the superblock. That, or since the sda > drive does appear in /dev, you might have some luck copying some of it to a > new drive. > > Beyond that, you are either going to need the advice of someone who > knows much more about md and Linux than I do, or else the services of a > professional drive recovery expert. They don't come cheap. > >> This is strange, now I am getting info from mdadm --examine that is >> different than before... > > It looks like sda may be responding for the time being. I suggest > you try to assemble the array, and if successful, copy whatever data you can > to a backup device. Do not mount the array as read-write until you have > recovered everything you can. If some data is orphaned, it might be in the > lost+found directory. If that's successful, I suggest you find out why you > had two failures and start over. I wouldn't use a 0.90 superblock, though, > and you definitely want to have monitoring enabled. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > If you have the spare drives/space I -highly- recommend dd_rescue / ddrescue copying the suspected-bad drives contents to clean drives. http://www.linuxfoundation.org/collaborate/workgroups/linux-raid/raid_recovery has a script to try out the combinations so you can see where the least data is lost. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html