On 21/04/11 20:29, John Valarti wrote:
Hi there. Please pardon my lack of experience and expertise here, as this is my first time posting. Where I work there is a fairly old fileserver. It is running CentOS 4, kernel 2.6.9-100EL Recently it failed and it tries to boot, but fails part way with: RAID5: not enough operational device for md1 (2/4 failed). This machine has data for a number of users, and, of course it seems the backup has not been roperly done for a few months ( responsible staff member left). I am in the position of being teh only likely person with a chance of recovering the data for a few users on this machine. And I am certainly NOT an expert! So, here is what I have done so far: On further inspection, I disconnected the drives out one at a time and determined which 2 are "failed". I pulled those out, and on another machine ran Seagate Seatest for Linux to test them. They both came out as healthy, although one apparently has a lot of uncommited bad sectors, or so the disk tool on a Fedora14 mchine tells me. I looked and see the layout is each of the 4 disks present have 2 partitions. After testing I was able to see the partitions on each disk with fdisk. I did not try to mount as these are simply RAID members, and I know there is no complete filesystem to mount on any single drive here. First partiton on each drive is small, /boot, and it seems to be RAID1 on all 4 drives. Those are healthy enough to get partially into a boot. The machine still boots to the point of trying to get access to / and then kernel panics. The / and other parts are on a RAID5 made from the second partiton of the 4 disks. I have returned all 4 disks to the machine, and using CentOS install/recovery media, have teh machine up in rescue mode. At this point I believe that I need to rebuild the RAID5. I understand that I probably only get one chance to do this right, so I write here today to beg some help with this. I do not lose other peoples data, Can anyone make me a suggestion? Thaks in advance for any help !
My first thought would be to get /all/ the disks, not just the "failed" ones, out of the machine. You want to make full images of them (with ddrescue or something similar) to files on another disk, and then work with those images. Don't touch the original disks - you will very quickly lose any chance you have of recovering your data. But once you've got the images, you can copy them and try out recovery strategies - all it costs is some disk space and some time, and you've no risk of making things worse.
Once you've got some (hopefully most) of your data recovered from the images, buy four /new/ disks to put in the machine, and work on your restore. You don't want to reuse the failing disks, and probably the other two equally old and worn disks will be high risk too.
-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html