Centos-4 Kernel pannic

mailing-lists at hughesjr.com (Johnny Hughes) · Tue Apr 12 21:00:29 2005

On Tue, April 12, 2005 3:08 pm, Bob Pierce said:
> Hi all,
>
> We are running a new Centos-4 server, and it has kernel panicked on us 4
> times in the last month. After the first kernel panic we hooked up a
> serial console to the server and captured the output in order to have a
> record of what happens.  I've included the error messages from the last
> time it locked up... but it doesn't really mean much to me. Anybody have
> any ideas what might be causing this server lock up?
>
> Server description:
> -Dell PE1750 - dual 2.8Ghz Xeon (with Hyper Threading on) - 2GB DDR RAM
> - Perc4-DI onboard RAID using 3 scsi drives in raid-5 configuration
> -ext3 file system
> -kernel-smp-2.6.9-5.0.3.EL
> -mysql - from distribution
> -2 postfix instances rebuilt with MySQL support
> -amavisd-new
> -clamav
> -spamassassin
> -rbldnsd
> -bind
>
>
> Here's the captured output from a serial console connected to the server
> at time of fault.
>
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000000
>  printing eip:
> f8872da8
> *pde = 35562001
> Oops: 0000 [#1]
> SMP
> Modules linked in: md5 ipv6 autofs4 sunrpc dm_mod button battery ac
> ohci_hcd tg3 floppy sg ext3 jbd megaraid_mbox megaraid_mm sd_mod
> scsi_mod
> CPU:    1
> EIP:    0060:[<f8872da8>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.9-5.0.3.ELsmp)
> EIP is at __journal_file_buffer+0x1b/0x221 [jbd]
> eax: 00000000   ebx: d2fff26c   ecx: 00000008   edx: c2327680
> esi: c2327680   edi: 00000008   ebp: 00000000   esp: f7533dd4
> ds: 007b   es: 007b   ss: 0068
> Process kjournald (pid: 210, threadinfo=f7533000 task=f75825b0)
> Stack: 00000000 00000000 f148fad8 f7f66200 d2fff26c c2327680 f887351b
> 00000286
>        00000000 00000000 00000000 00000000 00000000 d2517e6c f7f66200
> caa4c50c
>        00001f18 00000000 f75825b0 c011e8d2 f7533e44 f7533e44 f750c054
> f8836f24
> Call Trace:
>  [<f887351b>] journal_commit_transaction+0x310/0xfb1 [jbd]
>  [<c011e8d2>] autoremove_wake_function+0x0/0x2d
>  [<f8836f24>] megaraid_isr+0x1ad/0x1bf [megaraid_mbox]
>  [<c011e8d2>] autoremove_wake_function+0x0/0x2d
>  [<c011bcd5>] finish_task_switch+0x30/0x66
>  [<c02c4363>] schedule+0x833/0x869
>  [<c0127e62>] del_timer_sync+0x7a/0x9c
>  [<f8875e6d>] kjournald+0xc7/0x215 [jbd]
>  [<c011e8d2>] autoremove_wake_function+0x0/0x2d
>  [<c011e8d2>] autoremove_wake_function+0x0/0x2d
>  [<c011bd1d>] schedule_tail+0x12/0x55
>  [<f8875da0>] commit_timeout+0x0/0x5 [jbd]
>  [<f8875da6>] kjournald+0x0/0x215 [jbd]
>  [<c01041f1>] kernel_thread_helper+0x5/0xb
> Code: 14 ba 01 00 00 00 83 c4 10 89 d0 5b 5e 5f 5d c3 55 31 ed 57 89 cf
> 56 89 d6 53 53 53 89 c3 c7 44 24 04 00 00 00 00 8b 00 89 04 24 <8b> 00
> a9 00 00 08 00 75 29 68 d4 85 87 f8 68 9b 07 00 00 68 55
>

No idea what is causing this (looks like a Filesystem process to me), but
we have a new kernel (that will be included in CentOS-4.1).  It is
kernel-2.6.9-6.37.EL.src.rpm.

I would be glad to give you the new i686-smp kernel to see if it solves
your problem.

Are these EM64T Xeons or i686(32-bit) Xeons:
http://www.intel.com/products/processor/xeon/index.htm
(looking at the Dell site, I think they are 32-bit)

(If I am wrong and it is the EM64T Xeons, you should have installed the
x86_64 distro instead of the i386 one)

Also recommend the latest SCSI Controller BIOS:
http://support.dell.com/support/downloads/format.aspx?c=us&cs=04&l=en&s=bsd&SystemID=PWE_PNT_XEO_1750&os=LE30&osl=en&deviceid=2608&devlib=35&category=35&releaseid=R85295

and Server BIOS:
http://support.dell.com/support/downloads/format.aspx?c=us&cs=04&l=en&s=bsd&SystemID=PWE_PNT_XEO_1750&os=LE30&osl=en&deviceid=159&devlib=1&category=1&releaseid=R87618

-- 
Johnny Hughes
<http://www.HughesJR.com/>