ext3, S/W RAID-5 and many services

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stephen C. Tweedie wrote:

 >Hi,
 >
 >On Fri, Jan 18, 2002 at 04:15:28PM +0900, P. Fleury wrote:
 >
 >
 >>I use ext3 over Software RAID-5, and access this through Samba/NFS/HTTP.
 >> From time to time, the machine  hangs,  no response to any kind of
 >>input (ping does not respond, nor keyboard/mouse). Only hard-reset does
 >>the trick.
 >>
 >>I also notices that 2 of the 7 disks are in UDMA 33, the others in UDMA
 >>100. Does this have any impact ? (besides performance)
 >>
 >>If I do not mount the ext3 partition, it runs fine. Any help ?
 >>
 >
 >Can you trap kernel log output, in case there's an oops being
 >reported?  If you have a text-mode console, you may have to copy it
 >down by hand.  If not, it is possible to set up a serial console and
 >record the kernel output on another machine.
 >
 >Cheers,
 > Stephen
 >

Well, this time I got something. The sequence was:
- start machine, use it for 1/2 day, access it via NFS, HTTP, IMAP (3
concurrent sites) and Samba.
- After a while, machine load goes up, login impossible even on console.
After an hour, I could login as root.
- tried to reboot, to no avail. After 1 hour of waiting, tried 'telinit
6'. Then, remotely, nothing more possible.
- The machine did not reboot, says /dev/md cannot be unmounted, it is busy.
- hard reset.
- RAID-5 resync running for a while, then:

Jan 25 10:35:21 lafleur syslogd 1.4.1: restart.
Jan 25 11:26:38 lafleur kernel: Unable to handle kernel paging request
at virtual address 493dd238
Jan 25 11:26:38 lafleur kernel:  printing eip:
Jan 25 11:26:38 lafleur kernel: f083eff2
Jan 25 11:26:38 lafleur kernel: *pde = 00000000
Jan 25 11:26:38 lafleur kernel: Oops: 0002
Jan 25 11:26:38 lafleur kernel: CPU:    0
Jan 25 11:26:38 lafleur kernel: EIP:
0010:[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1388558/96] 

Not tainted
Jan 25 11:26:38 lafleur kernel: EIP:    0010:[<f083eff2>]    Not tainted
Jan 25 11:26:38 lafleur kernel: EFLAGS: 00010216
Jan 25 11:26:38 lafleur kernel: eax: 00000000   ebx: 00001000   ecx:
00000400   edx: 00000000
Jan 25 11:26:38 lafleur kernel: esi: 00000018   edi: 493dd238   ebp:
00000007   esp: efb19e58
Jan 25 11:26:38 lafleur kernel: ds: 0018   es: 0018   ss: 0018
Jan 25 11:26:38 lafleur kernel: Process raid5d (pid: 19, stackpage=efb19000)
Jan 25 11:26:38 lafleur kernel: Stack: ef861804 00001000 c017c6ad
c033ce80 00000282 00000282 00000003 c1f6f908
Jan 25 11:26:38 lafleur kernel:        c21d6400 00000000 00000007
00000000 00000001 00000004 f083ffd8 ef861800
Jan 25 11:26:38 lafleur kernel:        00000002 c01871dd 00000246
c033ce40 0000000c 0000007c fffffffc fffffff4
Jan 25 11:26:38 lafleur kernel: Call Trace:
[generic_make_request+241/256] generic_make_request [kernel] 0xf1
Jan 25 11:26:38 lafleur kernel: Call Trace: [<c017c6ad>]
generic_make_request [kernel] 0xf1
Jan 25 11:26:38 lafleur kernel:
[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1384488/96] 

__insmod_raid5_S.text_L13736 [raid5] 0x1f78
Jan 25 11:26:38 lafleur kernel: [<f083ffd8>]
__insmod_raid5_S.text_L13736 [raid5] 0x1f78
Jan 25 11:26:38 lafleur kernel: [ide_set_handler+85/92] ide_set_handler
[kernel] 0x55
Jan 25 11:26:38 lafleur kernel: [<c01871dd>] ide_set_handler [kernel] 0x55
Jan 25 11:26:38 lafleur kernel: [ide_dma_intr+0/156] ide_dma_intr
[kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [<c0190a3c>] ide_dma_intr [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [dma_timer_expiry+0/100]
dma_timer_expiry [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [<c019114c>] dma_timer_expiry [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [do_IRQ+144/156] do_IRQ [kernel] 0x90
Jan 25 11:26:38 lafleur kernel: [<c0108110>] do_IRQ [kernel] 0x90
Jan 25 11:26:38 lafleur kernel:
[3c59x:__insmod_3c59x_O/lib/modules/2.4.9-13/kernel/drivers/net/3c+-1382890/96] 

device_bsize [raid5] 0x222
Jan 25 11:26:38 lafleur kernel: [<f0840616>] device_bsize [raid5] 0x222
Jan 25 11:26:38 lafleur kernel: [md_thread+212/308] md_thread [kernel] 0xd4
Jan 25 11:26:38 lafleur kernel: [<c01b1454>] md_thread [kernel] 0xd4
Jan 25 11:26:38 lafleur kernel: [kernel_thread+38/48] kernel_thread
[kernel] 0x26
Jan 25 11:26:38 lafleur kernel: [<c010566e>] kernel_thread [kernel] 0x26
Jan 25 11:26:38 lafleur kernel: [md_thread+0/308] md_thread [kernel] 0x0
Jan 25 11:26:38 lafleur kernel: [<c01b1380>] md_thread [kernel] 0x0
Jan 25 11:26:38 lafleur kernel:
Jan 25 11:26:38 lafleur kernel:
Jan 25 11:26:38 lafleur kernel: Code: f3 ab f6 c3 02 74 02 66 ab f6 c3
01 74 01 aa 8b 14 24 8d 5d


After this, trying reboot says umount2 has problems, MD thread is being
interrupted after  the message 'Wait while the system is restarting' but
nothing happens.

Is there a way to spend less than 30 minutes per day baby-sitting my
server ?

--Pascal






[Index of Archives]         [Linux RAID]     [Kernel Development]     [Red Hat Install]     [Video 4 Linux]     [Postgresql]     [Fedora]     [Gimp]     [Yosemite News]

  Powered by Linux