Hello list, we have a running two node GFS 6.1 Cluster and today GFS crashed on one node suddenly. Please have a look at the following log messages: ---- Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: fatal: assertion "FALSE" failed Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: function = xmote_bh Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: file = /builddir/build/BUILD/gfs-kernel-2.6.9-60/smp/src/gfs/glock.c, line = 1093 Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: time = 1172575419 Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: about to withdraw from the cluster Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: waiting for outstanding I/O Feb 27 12:23:39 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: telling LM to withdraw ---- We are not able to reproduce the problem, because we are not sure what is responsible for this problem. I found an older post in this list, where the same problem exists, but there is no real solution or a reason why this is happening. The cluster's operating system is RHEL4 U4 (x86_64). Kernel version is 2.6.9-42.0.3.ELsmp and the following GFS rpms are installed and in use. GFS-6.1.6-1 GFS-kernel-2.6.9-60.3 GFS-kernel-smp-2.6.9-60.3 GFS-kernheaders-2.6.9-60.3 Any hints and tips to look deeper into this problem or even a solution would be great. For more details, please have a look at the attached crash log. Thanks in advance! -- Gruss / Regards Dirk Haller
Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: fatal: assertion "FALSE" failed Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: function = xmote_bh Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: file = /builddir/build/BUILD/gfs-kernel-2.6.9-60/smp/src/gfs/glock.c, line = 1093 Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: time = 1172575419 Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: about to withdraw from the cluster Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: waiting for outstanding I/O Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: telling LM to withdraw Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: fatal: assertion "FALSE" failed Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: function = xmote_bh Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: file = /builddir/build/BUILD/gfs-kernel-2.6.9-60/smp/src/gfs/glock.c, line = 1093 Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: time = 1172575419 Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: about to withdraw from the cluster Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: waiting for outstanding I/O Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: telling LM to withdraw Feb 27 12:23:43 node2 clurgmgrd: [14717]: <debug> Checking 172.23.50.51, Level 0 Feb 27 12:23:43 node2 clurgmgrd: [14717]: <debug> 172.23.50.51 present on bond1 Feb 27 12:23:43 node2 clurgmgrd: [14717]: <debug> Link for bond1: Detected Feb 27 12:23:43 node2 clurgmgrd: [14717]: <debug> Link detected on bond1 Feb 27 12:23:46 node1 GFS: fsid=ozeane:lt_atlantik.0: jid=1: Trying to acquire journal lock... Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243 Feb 27 12:23:46 node1 GFS: fsid=ozeane:lt_atlantik.0: jid=1: Looking at journal... Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243 Feb 27 12:23:46 node1 GFS: fsid=ozeane:lt_atlantik.0: jid=1: Acquiring the transaction lock... Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243 Feb 27 12:23:46 node1 GFS: fsid=ozeane:lt_atlantik.0: jid=1: Replaying journal... Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243 Feb 27 12:23:46 node1 GFS: fsid=ozeane:lt_atlantik.0: jid=1: Replayed 0 of 0 blocks Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243 Feb 27 12:23:46 node1 GFS: fsid=ozeane:lt_atlantik.0: jid=1: replays = 0, skips = 0, sames = 0 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243 Feb 27 12:23:46 node1 GFS: fsid=ozeane:lt_atlantik.0: jid=1: Journal replayed in 1s Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243 Feb 27 12:23:46 node2 lock_dlm: withdraw abandoned memory Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 GFS: fsid=ozeane:lt_atlantik.1: withdrawn Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 GFS: fsid=ozeane:lt_atlantik.1: ret = 0x00000003 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node1 GFS: fsid=ozeane:lt_atlantik.0: jid=1: Done Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173243 Feb 27 12:23:46 node2 general protection fault: 0000 [1] SMP Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 CPU 0 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 Modules linked in: nfsd exportfs lockd nfs_acl sg cpqci(U) mptctl mptbase netconsole netdump i2c_dev i2c_core sunrpc ext3 jbd button battery ac ohci_hcd hw_random shpchp floppy md5 ipv6 lock_dlm(U) dlm(U) gfs(U) lock_harness(U) cman(U) bonding(U) dm_round_robin dm_multipath qla2300 qla2xxx scsi_transport_fc cciss sd_mod scsi_mod dm_snapshot dm_mirror dm_mod tg3 e1000 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 Pid: 17539, comm: lock_dlm1 Tainted: P 2.6.9-42.0.3.ELsmp Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 RIP: 0010:[<ffffffffa013debc>] <ffffffffa013debc>{:gfs:run_queue+477} Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 RSP: 0018:00000100e5891db8 EFLAGS: 00010202 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 RAX: 000000000006000f RBX: 000001006f426920 RCX: 0000000000000001 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 RDX: ffffffffa017e9c0 RSI: 0000000000000001 RDI: 000001006f4268c8 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 RBP: 000001006d604420 R08: ffffffff803e1fe8 R09: 0000000000000001 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 R10: 0000000100000000 R11: ffffffff8011e884 R12: 0000000000000001 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 R13: 560a11000001000a R14: ffffff0000481000 R15: 000001006f4268c8 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 FS: 0000002a96a970e0(0000) GS:ffffffff804e5180(0000) knlGS:00000000f61d1bb0 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 CR2: 0000002a96a86880 CR3: 0000000000101000 CR4: 00000000000006e0 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 Process lock_dlm1 (pid: 17539, threadinfo 00000100e5890000, task 00000100e8ed17f0) Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 Stack: 0000000000000000 000001006f4268f4 000001006d604420 000001006f4268f4 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 000001006f4268c8 ffffff0000481000 0000000000000003 ffffffffa013facf Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 0000000000000001 0000000000000001 Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 Call Trace:<ffffffffa013facf>{:gfs:xmote_bh+953} <ffffffffa0141426>{:gfs:gfs_glock_cb+194} Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 <ffffffffa01a8a75>{:lock_dlm:dlm_async+1989} <ffffffff80133dfe>{__wake_up_common+67} Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 <ffffffff80133dad>{default_wake_function+0} <ffffffff8014b4f4>{keventd_create_kthread+0} Feb 27 12:23:46 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:46 node2 <ffffffffa01a82b0>{:lock_dlm:dlm_async+0} Â?Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: fatal: assertion "FALSE" failed Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: function = xmote_bh Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: file = /builddir/build/BUILD/gfs-kernel-2.6.9-60/smp/src/gfs/glock.c, line = 1093 Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: time = 1172575419 Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: about to withdraw from the cluster Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: waiting for outstanding I/O Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 GFS: fsid=ozeane:lt_atlantik.1: telling LM to withdraw Feb 27 12:23:43 syslog-server netdump[5743]: Got strange package from ip 0xac173242 Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: fatal: assertion "FALSE" failed Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: function = xmote_bh Feb 27 12:23:43 node2 kernel: GFS: fsid=ozeane:lt_atlantik.1: file = /builddir/build/BUILD/gfs-kernel-2.6.9-60/smp/src/gfs/glock.c, line = 1093 Feb 27 12:23:47 node1 clurgmgrd: [15224]: <debug> Link detected on bond1 Feb 27 12:23:48 node1 clurgmgrd: [15224]: <notice> Using atlantik as NetBIOS name (service atlantik) Feb 27 12:23:48 node1 clurgmgrd: [15224]: <debug> Checking Samba instance "atlantik" Feb 27 12:23:48 node1 clurgmgrd: [15224]: <debug> Checking 172.23.50.52, Level 0 Feb 27 12:23:48 node1 smbd[10164]: [2007/02/27 12:23:42, 0] printing/print_cups.c:cups_cache_reload(85) Feb 27 12:23:48 node1 smbd[31559]: [2007/02/27 12:23:42, 0] printing/print_cups.c:cups_cache_reload(85) Feb 27 12:23:48 node1 smbd[10164]: Unable to connect to CUPS server localhost - Connection refused Feb 27 12:23:48 node1 smbd[31559]: Unable to connect to CUPS server localhost - Connection refused Feb 27 12:23:48 node1 smbd[10164]: [2007/02/27 12:23:42, 0] printing/print_cups.c:cups_cache_reload(85) Feb 27 12:23:48 node1 smbd[31559]: [2007/02/27 12:23:42, 0] printing/print_cups.c:cups_cache_reload(85) Feb 27 12:23:48 node1 smbd[10164]: Unable to connect to CUPS server localhost - Connection refused Feb 27 12:23:48 node1 smbd[31559]: Unable to connect to CUPS server localhost - Connection refused Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Trying to acquire journal lock... Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Looking at journal... Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Acquiring the transaction lock... Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Replaying journal... Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Replayed 0 of 0 blocks Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: replays = 0, skips = 0, sames = 0 Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Journal replayed in 1s Feb 27 12:23:48 node1 kernel: GFS: fsid=ozeane:lt_atlantik.0: jid=1: Done
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster