ext3, S/W RAID-5 and many services

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stephen C. Tweedie wrote:

>Hi,
>
>On Fri, Jan 18, 2002 at 04:15:28PM +0900, P. Fleury wrote:
> 
>
>>I use ext3 over Software RAID-5, and access this through Samba/NFS/HTTP. 
>> From time to time, the machine  hangs,  no response to any kind of 
>>input (ping does not respond, nor keyboard/mouse). Only hard-reset does 
>>the trick.
>>
>>I also notices that 2 of the 7 disks are in UDMA 33, the others in UDMA 
>>100. Does this have any impact ? (besides performance)
>>
>>If I do not mount the ext3 partition, it runs fine. Any help ?
>>
>
>Can you trap kernel log output, in case there's an oops being
>reported?  If you have a text-mode console, you may have to copy it
>down by hand.  If not, it is possible to set up a serial console and
>record the kernel output on another machine.
>
>Cheers,
> Stephen
>

Well, this time I got something. The sequence was:
- start machine, use it for 1/2 day, access it via NFS, HTTP, IMAP (3 
concurrent sites) and Samba.
- After a while, machine load goes up, login impossible even on console. 
After an hour, I could login as root.
- tried to reboot, to no avail. After 1 hour of waiting, tried 'telinit 
6'. Then, remotely, nothing more possible.
- The machine did not reboot, says /dev/md cannot be unmounted, it is busy.
- hard reset.
- RAID-5 resync running for a while, then:

Jan 25 10:35:21 lafleur syslogd 1.4.1: restart.
Jan 25 11:26:38 lafleur kernel: Unable to handle kernel paging request 
at virtual address 493dd238
Jan 25 11:26:38 lafleur kernel:  printing eip:
Jan 25 11:26:38 lafleur kernel: f083eff2
Jan 25 11:26:38 lafleur kernel: *pde = 00000000
Jan 25 11:26:38 lafleur kernel: Oops: 0002
Jan 25 11:26:38 lafleur kernel: CPU:    0
Jan 25 11:26:38 lafleur kernel: EIP:    
0010:[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1388558/96]    
Not tainted
Jan 25 11:26:38 lafleur kernel: EIP:    0010:[<f083eff2>]    Not tainted
Jan 25 11:26:38 lafleur kernel: EFLAGS: 00010216
Jan 25 11:26:38 lafleur kernel: eax: 00000000   ebx: 00001000   ecx: 
00000400   edx: 00000000
Jan 25 11:26:38 lafleur kernel: esi: 00000018   edi: 493dd238   ebp: 
00000007   esp: efb19e58
Jan 25 11:26:38 lafleur kernel: ds: 0018   es: 0018   ss: 0018
Jan 25 11:26:38 lafleur kernel: Process raid5d (pid: 19, stackpage=efb19000)
Jan 25 11:26:38 lafleur kernel: Stack: ef861804 00001000 c017c6ad 
c033ce80 00000282 00000282 00000003 c1f6f908
Jan 25 11:26:38 lafleur kernel:        c21d6400 00000000 00000007 
00000000 00000001 00000004 f083ffd8 ef861800
Jan 25 11:26:38 lafleur kernel:        00000002 c01871dd 00000246 
c033ce40 0000000c 0000007c fffffffc fffffff4
Jan 25 11:26:38 lafleur kernel: Call Trace: 
[generic_make_request+241/256] generic_make_request [kernel] 0xf1
Jan 25 11:26:38 lafleur kernel: Call Trace: [<c017c6ad>] 
generic_make_request [kernel] 0xf1
Jan 25 11:26:38 lafleur kernel: 
[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1384488/96] 
__insmod_raid5_S.text_L13736 [raid5] 0x1f78
Jan 25 11:26:38 lafleur kernel: [<f083ffd8>] 
__insmod_raid5_S.text_L13736 [raid5] 0x1f78
Jan 25 11:26:38 lafleur kernel: [ide_set_handler+85/92] ide_set_handler 
[kernel] 0x55
Jan 25 11:26:38 lafleur kernel: [<c01871dd>] ide_set_handler [kernel] 0x55
Jan 25 11:26:38 lafleur kernel: [ide_dma_intr+0/156] ide_dma_intr 
[kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [<c0190a3c>] ide_dma_intr [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [dma_timer_expiry+0/100] 
dma_timer_expiry [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [<c019114c>] dma_timer_expiry [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [do_IRQ+144/156] do_IRQ [kernel] 0x90
Jan 25 11:26:38 lafleur kernel: [<c0108110>] do_IRQ [kernel] 0x90
Jan 25 11:26:38 lafleur kernel: 
[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1382890/96] 
device_bsize [raid5] 0x222
Jan 25 11:26:38 lafleur kernel: [<f0840616>] device_bsize [raid5] 0x222
Jan 25 11:26:38 lafleur kernel: [md_thread+212/308] md_thread [kernel] 0xd4
Jan 25 11:26:38 lafleur kernel: [<c01b1454>] md_thread [kernel] 0xd4
Jan 25 11:26:38 lafleur kernel: [kernel_thread+38/48] kernel_thread 
[kernel] 0x26
Jan 25 11:26:38 lafleur kernel: [<c010566e>] kernel_thread [kernel] 0x26
Jan 25 11:26:38 lafleur kernel: [md_thread+0/308] md_thread [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [<c01b1380>] md_thread [kernel] 0x0
Jan 25 11:26:38 lafleur kernel:
Jan 25 11:26:38 lafleur kernel:
Jan 25 11:26:38 lafleur kernel: Code: f3 ab f6 c3 02 74 02 66 ab f6 c3 
01 74 01 aa 8b 14 24 8d 5d


After this, trying reboot says umount2 has problems, MD thread is being 
interrupted after  the message 'Wait while the system is restarting' but 
nothing happens.

Is there a way to spend less than 30 minutes per day baby-sitting my 
server ?

--Pascal





[Index of Archives]         [Linux RAID]     [Kernel Development]     [Red Hat Install]     [Video 4 Linux]     [Postgresql]     [Fedora]     [Gimp]     [Yosemite News]

  Powered by Linux