hi ya danilo i suspect you dont have a spare disk ??? since you have 3 drives and hde, hdg, and hdi is used... - so the error message is correct ( it will run in degraded mode ) - some motherboards/pci cards are dumb... - you CANNOT boot off of hde,hdf,hdg,hdh.... ( bad choices of mb for booting raid5 i suppose - if your disk partition type is "raid-autodetec" ( fd ) ... than your setup is fine... just move everything to hda, hdc and hde and hope that hde is the drive that fails... - boot from a stand alone floppy/linux-bbc to move its root around - if you cannot boot off hda+hdc or hda+hde or hdc+hde than i'd use a simple mirror'ed root raid so that you can boot off of either hda or hdc but your raid5 data is still raid5 across hda, hdc, hde and i'd add a new hdf disk too and yes... pulling the ide cable off of the disk is a good test for booting and testing raid5... cool ... c ya alvin http://www.1U-Raid5.net ... On Thu, 31 Jan 2002, Danilo Godec wrote: > I have three disks, each partitioned into three partitions. For /boot, > I > use RAID1 over first partitions (hde1, hdg1 and hdi1). > > For root, I use RAID5 over second set of partitions (hde2, hdg2 and hdi2) > and for swap, I use RAID5 over the third set of partitions (hde3, hdg3 and > hdi3). > > The problem is, that each time I try to simulate a disk failure (using > raidsetfaulty) on any of RAID5 arrays, I get a nasty error (this is a > 'copy-paste' version, the actual md device is blocked): > > raid5: Disk failure on hdi2, disabling device. Operation continuing on 2 > devicesUnable to handle kernel NULL pointer dereference at virtual address > 00000000 > printing eip: > c01d1cd3 > *pde = 00000000 > Oops: 0000 > CPU: 0 > EIP: 0010:[<c01d1cd3>] Not tainted > EFLAGS: 00010246 > eax: dfec34a0 ebx: 00003802 ecx: 00000400 edx: 00001000 > esi: 00000000 edi: dfc5e000 ebp: dfee0da0 esp: dfeb9f50 > ds: 0018 es: 0018 ss: 0018 > Process raid5d (pid: 10, stackpage=dfeb9000) > Stack: dfee0da0 dfee0e60 dfec5dd4 dfec5dc0 00000000 dfec34a0 c01d1f78 dfee0da0 > c02572ff 00000004 dfeb8000 dfed7c00 dfee0ca0 00000001 00000064 00000000 > c01cdac9 dfec5dc0 dfeb8000 dfeb8000 dfee0ca0 00000001 c0288000 00000246 > Call Trace: [<c01d1f78>] [<c01cdac9>] [<c01d4c1c>] [<c0105794>] > > Code: f3 a5 8b 44 24 14 8b 54 24 10 f0 0f ab 50 18 8b 44 24 14 e8 > [root@temp /root]# <6>md: recovery thread got woken up ... > md1: no spare disk to reconstruct array! -- continuing in degraded mode > md: recovery thread finished ... > > > It seems something is wrong in raid5 code... Can anyone confirm/deny this? > > Thanks, D. > > PS: I tried using ext2 and ext3 on the root partition. It didn't matter. > > PPS: If I disconnect one of the disks (the power cable) as to simulate > real hardware failure, the disk IO is blocked (ie. nothing that isn't > already in memory cannot be loaded or executed) and the system is telling > me that the disk has lost interrupt - for a looong time. The RAID system > didn't detect the failure and kick the disk out of array(s). Shouldn't it? > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html