Were getting consistent kernel panics with most of our GFS nodes all
pointing to the same line and file:
Kernel panic: GFS: Assertion failed on line 1227 of file rgrp.c
The longest uptime we've seen was a couple weeks, but for the most part
the nodes will only stay up for a few days. Once one goes down, a
couple more follow, not immediately, but within the hour.
Configured are 5 nodes, three lock_gulm servers and two clients.
Currently all are running the same version kernel, GFS and GFS modules
as list below.
- GFS-modules-smp-6.0.2.27-0
- GFS-6.0.2.27-0
- 2.4.21-37.ELsmp
- Scientific Linux Release 303 (Fermi), RHES 3, update 3 really.
- This is a new installation.
From /var/log/messages:
Jan 2 12:46:48 fnd0374 kernel: ce1bbbac f8bc7b72 00000246 00001000
db122da8 f8bf4000 db122da8 f8bc7d70
Jan 2 12:46:48 fnd0374 kernel: 00000246 000001f0 00000000
55b6bd08 f60fdd8c 00000005 00000001 ffffffff
Jan 2 12:46:48 fnd0374 kernel: f8be16d0 f8be8c8f f8be8bc4
000004cb 00000016 f60f7c00 00000006 f8bf4000
Jan 2 12:46:48 fnd0374 kernel: Call Trace: [<f8bc7b72>] gfs_asserti
[gfs] 0x32 (0xce1bbbb0)
Jan 2 12:46:48 fnd0374 kernel: [<f8bc7d70>] gmalloc [gfs] 0x20 (0xce1bbbc8)
Jan 2 12:46:48 fnd0374 kernel: [<f8be16d0>] blkalloc_internal [gfs]
0x130 (0xce1bbbec)
Jan 2 12:46:48 fnd0374 kernel: [<f8be8c8f>] .rodata.str1.1 [gfs] 0x1da3
(0xce1bbbf0)
Jan 2 12:46:48 fnd0374 kernel: [<f8be8bc4>] .rodata.str1.1 [gfs] 0x1cd8
(0xce1bbbf4)
Jan 2 12:46:48 fnd0374 kernel: [<f8be1b8b>] gfs_blkalloc [gfs] 0x7b
(0xce1bbc20)
Jan 2 12:46:48 fnd0374 kernel: [<f8bbb90c>] get_datablock [gfs] 0xfc
(0xce1bbc4c)
Jan 2 12:46:48 fnd0374 kernel: [<f8bbbc43>] gfs_block_map [gfs] 0x333
(0xce1bbc70)
Jan 2 12:46:48 fnd0374 kernel: [<c0149093>] find_or_create_page
[kernel] 0x63 (0xce1bbc9c)
Jan 2 12:46:48 fnd0374 kernel: [<f8bac08c>] gfs_dgetblk [gfs] 0x3c
(0xce1bbcec)
Jan 2 12:46:48 fnd0374 kernel: [<f8bb5239>] get_block [gfs] 0xb9
(0xce1bbd28)
Jan 2 12:46:48 fnd0374 kernel: [<c016814b>] __block_prepare_write
[kernel] 0x1ab (0xce1bbd64)
Jan 2 12:46:48 fnd0374 kernel: [<c0168b09>] block_prepare_write
[kernel] 0x39 (0xce1bbda8)
Jan 2 12:46:48 fnd0374 kernel: [<f8bb5180>] get_block [gfs] 0x0
(0xce1bbdbc)
Jan 2 12:46:48 fnd0374 kernel: [<f8bb58fc>] gfs_prepare_write [gfs]
0x12c (0xce1bbdc8)
Jan 2 12:46:48 fnd0374 kernel: [<f8bb5180>] get_block [gfs] 0x0
(0xce1bbdd8)
Jan 2 12:46:48 fnd0374 kernel: [<c014c053>] do_generic_file_write
[kernel] 0x1e3 (0xce1bbdf4)
Jan 2 12:46:48 fnd0374 kernel: [<f8bafbab>] do_do_write [gfs] 0x2ab
(0xce1bbe48)
Jan 2 12:46:48 fnd0374 kernel: [<f8baffeb>] do_write [gfs] 0x18b
(0xce1bbe94)
Jan 2 12:46:48 fnd0374 kernel: [<f8badf1e>] gfs_walk_vma [gfs] 0x12e
(0xce1bbed0)
Jan 2 12:46:48 fnd0374 kernel: [<c0225936>] sock_read [kernel] 0x96
(0xce1bbf50)
Jan 2 12:46:48 fnd0374 kernel: [<f8bb00c1>] gfs_write [gfs] 0x91
(0xce1bbf6c)
Jan 2 12:46:48 fnd0374 kernel: [<f8bafe60>] do_write [gfs] 0x0 (0xce1bbf80)
Jan 2 12:46:48 fnd0374 kernel: [<c0164b27>] sys_write [kernel] 0x97
(0xce1bbf94)
Jan 2 12:46:48 fnd0374 kernel:
Jan 2 12:46:48 fnd0374 kernel: Kernel panic: GFS: Assertion failed on
line 1227 of file rgrp.c
Jan 2 12:46:48 fnd0374 kernel: GFS: assertion: "x <= length"
Jan 2 12:46:48 fnd0374 kernel: GFS: time = 1136227608
Jan 2 12:46:48 fnd0374 kernel: GFS: fsid=d0recon:d0.4: RG = 71028427
Appreciate any help.
Thanks,
Paul
--
===========================================================================
Paul Tader <ptader@xxxxxxxx>
Fermi National Accelerator Lab; PO Box 500 Batavia, IL 60510-0500
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster