On Tue, April 12, 2005 3:08 pm, Bob Pierce said: > Hi all, > > We are running a new Centos-4 server, and it has kernel panicked on us 4 > times in the last month. After the first kernel panic we hooked up a > serial console to the server and captured the output in order to have a > record of what happens. I've included the error messages from the last > time it locked up... but it doesn't really mean much to me. Anybody have > any ideas what might be causing this server lock up? > > Server description: > -Dell PE1750 - dual 2.8Ghz Xeon (with Hyper Threading on) - 2GB DDR RAM > - Perc4-DI onboard RAID using 3 scsi drives in raid-5 configuration > -ext3 file system > -kernel-smp-2.6.9-5.0.3.EL > -mysql - from distribution > -2 postfix instances rebuilt with MySQL support > -amavisd-new > -clamav > -spamassassin > -rbldnsd > -bind > > > Here's the captured output from a serial console connected to the server > at time of fault. > > Unable to handle kernel NULL pointer dereference at virtual address > 00000000 > printing eip: > f8872da8 > *pde = 35562001 > Oops: 0000 [#1] > SMP > Modules linked in: md5 ipv6 autofs4 sunrpc dm_mod button battery ac > ohci_hcd tg3 floppy sg ext3 jbd megaraid_mbox megaraid_mm sd_mod > scsi_mod > CPU: 1 > EIP: 0060:[<f8872da8>] Not tainted VLI > EFLAGS: 00010246 (2.6.9-5.0.3.ELsmp) > EIP is at __journal_file_buffer+0x1b/0x221 [jbd] > eax: 00000000 ebx: d2fff26c ecx: 00000008 edx: c2327680 > esi: c2327680 edi: 00000008 ebp: 00000000 esp: f7533dd4 > ds: 007b es: 007b ss: 0068 > Process kjournald (pid: 210, threadinfo=f7533000 task=f75825b0) > Stack: 00000000 00000000 f148fad8 f7f66200 d2fff26c c2327680 f887351b > 00000286 > 00000000 00000000 00000000 00000000 00000000 d2517e6c f7f66200 > caa4c50c > 00001f18 00000000 f75825b0 c011e8d2 f7533e44 f7533e44 f750c054 > f8836f24 > Call Trace: > [<f887351b>] journal_commit_transaction+0x310/0xfb1 [jbd] > [<c011e8d2>] autoremove_wake_function+0x0/0x2d > [<f8836f24>] megaraid_isr+0x1ad/0x1bf [megaraid_mbox] > [<c011e8d2>] autoremove_wake_function+0x0/0x2d > [<c011bcd5>] finish_task_switch+0x30/0x66 > [<c02c4363>] schedule+0x833/0x869 > [<c0127e62>] del_timer_sync+0x7a/0x9c > [<f8875e6d>] kjournald+0xc7/0x215 [jbd] > [<c011e8d2>] autoremove_wake_function+0x0/0x2d > [<c011e8d2>] autoremove_wake_function+0x0/0x2d > [<c011bd1d>] schedule_tail+0x12/0x55 > [<f8875da0>] commit_timeout+0x0/0x5 [jbd] > [<f8875da6>] kjournald+0x0/0x215 [jbd] > [<c01041f1>] kernel_thread_helper+0x5/0xb > Code: 14 ba 01 00 00 00 83 c4 10 89 d0 5b 5e 5f 5d c3 55 31 ed 57 89 cf > 56 89 d6 53 53 53 89 c3 c7 44 24 04 00 00 00 00 8b 00 89 04 24 <8b> 00 > a9 00 00 08 00 75 29 68 d4 85 87 f8 68 9b 07 00 00 68 55 > No idea what is causing this (looks like a Filesystem process to me), but we have a new kernel (that will be included in CentOS-4.1). It is kernel-2.6.9-6.37.EL.src.rpm. I would be glad to give you the new i686-smp kernel to see if it solves your problem. Are these EM64T Xeons or i686(32-bit) Xeons: http://www.intel.com/products/processor/xeon/index.htm (looking at the Dell site, I think they are 32-bit) (If I am wrong and it is the EM64T Xeons, you should have installed the x86_64 distro instead of the i386 one) Also recommend the latest SCSI Controller BIOS: http://support.dell.com/support/downloads/format.aspx?c=us&cs=04&l=en&s=bsd&SystemID=PWE_PNT_XEO_1750&os=LE30&osl=en&deviceid=2608&devlib=35&category=35&releaseid=R85295 and Server BIOS: http://support.dell.com/support/downloads/format.aspx?c=us&cs=04&l=en&s=bsd&SystemID=PWE_PNT_XEO_1750&os=LE30&osl=en&deviceid=159&devlib=1&category=1&releaseid=R87618 -- Johnny Hughes <http://www.HughesJR.com/>