Well, I got the RAID up, I had reiserfsck work its mojo (it looks like I lost lots of folder names, but the files appear to remember who they are) BUT mount segfaults (or something segfaults) every time I try to mount the damn thing... I'm going to try running 2.6.somthing, hoping that maybe of the tools I built was just too new for suse 8.2/linux 2.4.23... but i highly doubt it... who knows, maybe 2.6 will behave more nicely... i hope mount -o ro will be enough to protect me if it doesn't... who knows... any ideas what might be segfaulting mount?... this is from /var/log/messages from about the time I tried mounting Mar 6 01:14:39 ilneval kernel: raid5: switching cache buffer size, 4096 --> 1024 Mar 6 01:14:39 ilneval kernel: raid5: switching cache buffer size, 1024 --> 4096 Mar 6 01:14:39 ilneval kernel: reiserfs: found format "3.6" with standard journal Mar 6 01:14:41 ilneval kernel: Unable to handle kernel paging request at virtual address e09ce004 Mar 6 01:14:41 ilneval kernel: printing eip: Mar 6 01:14:41 ilneval kernel: c01839b5 Mar 6 01:14:41 ilneval kernel: *pde = 1f5f7067 Mar 6 01:14:41 ilneval kernel: *pte = 00000000 Mar 6 01:14:41 ilneval kernel: Oops: 0002 Mar 6 01:14:41 ilneval kernel: CPU: 0 Mar 6 01:14:41 ilneval kernel: EIP: 0010:[<c01839b5>] Not tainted Mar 6 01:14:41 ilneval kernel: EFLAGS: 00010286 Mar 6 01:14:41 ilneval kernel: eax: dae13bc0 ebx: e09c6000 ecx: dae13c08 edx: dae13bc0 Mar 6 01:14:41 ilneval kernel: esi: df26a000 edi: 00001000 ebp: dbf32000 esp: dbeb1e2c Mar 6 01:14:41 ilneval kernel: ds: 0018 es: 0018 ss: 0018 Mar 6 01:14:41 ilneval kernel: Process mount (pid: 829, stackpage=dbeb1000) Mar 6 01:14:41 ilneval kernel: Stack: 00000902 00001003 00001000 00000003 00000001 df26a000 00000902 dbf32000 Mar 6 01:14:41 ilneval kernel: c01843cc df26a000 00000400 00002000 dbeb1e68 00000001 00000000 00000000 Mar 6 01:14:41 ilneval kernel: 00000246 00000000 00000000 00000902 fffffff3 df26a000 00000001 c013a4ba Mar 6 01:14:41 ilneval kernel: Call Trace: [<c01843cc>] [<c013a4ba>] [<c013ad4b>] [<c014c8ae>] [<c013b0d0>] Mar 6 01:14:41 ilneval kernel: [<c014da3e>] [<c014dd6c>] [<c014db95>] [<c014e15a>] [<c010745f>] Mar 6 01:14:41 ilneval kernel: Mar 6 01:14:41 ilneval kernel: Code: 89 44 fb 04 b8 01 00 00 00 8b 96 f4 00 00 00 8b 4c fa 04 85 On Friday 05 March 2004 12:25 pm, Corey McGuire wrote: > That kinda worked!!!!!! I need to FSCK it, but i'm still afraid of fscking > it up... > > Does anyone in San Jose/San Francisco/Anywhere-in-frag'n-California have a > free TB I can use for a DD? I will offer you my first child! > > If I need to sweeten the deal, I have LOTS to share... I have a TB of > goodies just looking to be backed up! > > On Friday 05 March 2004 10:14 am, you wrote: > > I had a 2 disk failure; I will explain what I did. > > 1 disk was bad; it affected all disks on that SCSI buss. > > The RAID software got into a bad state, I think I needed to reboot, or > > power cycle. > > After the reboot, it said 2 disks were non fresh or whatever. > > My array had 14 disks, 7 on the buss with the 2 non fresh disks. > > I could not do a dd read test with much success on most of the disks, > > maybe 2 or 3 seemed ok, but not if I did 2 dd's at the same time. > > So I unplugged all disks but 1, tested the 1. If success repeat with the > > next disk. I found 1 disk that did not work. So I connected the 6 good > > disks. Did 6 dd's at the same time, all was well. > > > > So, now I have 13 of 14 disks and 1 of the 13 is non fresh. I issued > > this command. > > > > mdadm -A --force /dev/md2 --scan > > For some reason my filesystem was corrupt. I noticed that the spare disk > > was in the list. I knew the rebuild to the spare never finished. It may > > not have been synced at all since so many disks were not working. So, I > > knew the spare should not be part of the array, yet! > > > > I had trouble stopping the array, so reboot. > > > > This time I listed the disks excluding the spare and the failed disk. > > > > mdadm -A --force /dev/md2 /dev/sdk1 /dev/sdd1 /dev/sdl1 /dev/sde1 > > /dev/sdm1 /dev/sdf1 /dev/sdn1 /dev/sdg1 /dev/sdh1 /dev/sdo1 /dev/sdi1 > > /dev/sdp1 /dev/sdj1 > > > > I did not include the missing disk, but I did include the non fresh disk. > > Now my filesystem is fine. > > > > I added the spare, it re-built, a good day! I bet if this had happened > > to a hardware RAID it could not have been saved. > > > > I replaced the bad disk and added it as a spare. > > That was about 1 month ago, everything is still fine. > > > > You will need to install mdadm if you don't have it. mdadm does not use > > raidtab, it uses /etc/mdadm.conf > > > > Man mdadm for details! > > > > Good luck! > > > > Guy > > > > ========================================================================= > >== = Tips: > > > > This will give details of each disk. > > mdadm -E /dev/hda3 > > repeat for hdc3, hde3, hdg3, hdi3, hdk3. > > > > dd test... To test a disk to determine if the surface is good. > > This is just a read test! > > dd if=/dev/hda of=/dev/null bs=64k > > repeat for hdc, hde, hdg, hdi, hdk. > > > > My mdadm.conf: > > MAILADDR bugzilla@watkins-home.com > > PROGRAM /root/bin/handle-mdadm-events > > > > DEVICE /dev/sd[abcdefghijklmnopqrstuvwxyz][12] > > > > ARRAY /dev/md0 level=raid1 num-devices=2 > > UUID=1fb2890c:2c9c47bf:db12e1e3:16cd7ffe > > > > ARRAY /dev/md1 level=raid1 num-devices=2 > > UUID=8f183b62:ea93fe30:a842431c:4b93c7bb > > > > ARRAY /dev/md2 level=raid5 num-devices=14 > > UUID=8357a389:8853c2d1:f160d155:6b4e1b99 > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html