On Thursday January 31, danci@agenda.si wrote: > I have three disks, each partitioned into three partitions. For /boot, > I > use RAID1 over first partitions (hde1, hdg1 and hdi1). > > For root, I use RAID5 over second set of partitions (hde2, hdg2 and hdi2) > and for swap, I use RAID5 over the third set of partitions (hde3, hdg3 and > hdi3). > > The problem is, that each time I try to simulate a disk failure (using > raidsetfaulty) on any of RAID5 arrays, I get a nasty error (this is a > 'copy-paste' version, the actual md device is blocked): > > raid5: Disk failure on hdi2, disabling device. Operation continuing on 2 > devicesUnable to handle kernel NULL pointer dereference at virtual address > 00000000 > printing eip: > c01d1cd3 Any change of running this Oops through ksymoops to see where the problem really is? NeilBrown > *pde = 00000000 > Oops: 0000 > CPU: 0 > EIP: 0010:[<c01d1cd3>] Not tainted > EFLAGS: 00010246 > eax: dfec34a0 ebx: 00003802 ecx: 00000400 edx: 00001000 > esi: 00000000 edi: dfc5e000 ebp: dfee0da0 esp: dfeb9f50 > ds: 0018 es: 0018 ss: 0018 > Process raid5d (pid: 10, stackpage=dfeb9000) > Stack: dfee0da0 dfee0e60 dfec5dd4 dfec5dc0 00000000 dfec34a0 c01d1f78 dfee0da0 > c02572ff 00000004 dfeb8000 dfed7c00 dfee0ca0 00000001 00000064 00000000 > c01cdac9 dfec5dc0 dfeb8000 dfeb8000 dfee0ca0 00000001 c0288000 00000246 > Call Trace: [<c01d1f78>] [<c01cdac9>] [<c01d4c1c>] [<c0105794>] > > Code: f3 a5 8b 44 24 14 8b 54 24 10 f0 0f ab 50 18 8b 44 24 14 e8 > [root@temp /root]# <6>md: recovery thread got woken up ... > md1: no spare disk to reconstruct array! -- continuing in degraded mode > md: recovery thread finished ... > > > It seems something is wrong in raid5 code... Can anyone confirm/deny this? > > Thanks, D. > > PS: I tried using ext2 and ext3 on the root partition. It didn't matter. > > PPS: If I disconnect one of the disks (the power cable) as to simulate > real hardware failure, the disk IO is blocked (ie. nothing that isn't > already in memory cannot be loaded or executed) and the system is telling > me that the disk has lost interrupt - for a looong time. The RAID system > didn't detect the failure and kick the disk out of array(s). Shouldn't it? > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html