hey folks, I have 2 nodes running GFS 6.1.5 [root@tf1 ~]# rpm -qa | grep -i gfs GFS-6.1.5-0 GFS-kernheaders-2.6.9-49.1 GFS-kernel-smp-2.6.9-49.1 [root@tf1 ~]# rpm -qa | grep -i ccs ccs-devel-1.0.3-0 ccs-1.0.3-0 [root@tf1 ~]# [root@tf1 ~]# uname -a Linux tf1.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:54:53 EST 2006 i686 i686 i386 GNU/Linux [root@tf1 ~]# and last week, we had them both go down on us unexpectedly. one had paniced and the other was powered off.. these systems are NOT in production yet, so there was some data on the GFS partition, but im pretty sure that there was not much activity when the boxes went down. Any help on what to do about this would be appreciated.. Here is the log from the one that panicd. Jun 10 03:59:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45030 seconds. Jun 10 03:59:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45060 seconds. Jun 10 04:00:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45090 seconds. Jun 10 04:00:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45120 seconds. Jun 10 04:01:01 tf1 crond(pam_unix)[15618]: session opened for user root by (uid=0) Jun 10 04:01:01 tf1 crond(pam_unix)[15618]: session closed for user root Jun 10 04:01:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45150 seconds. Jun 10 04:01:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45180 seconds. Jun 10 04:02:01 tf1 crond(pam_unix)[15620]: session opened for user root by (uid=0) Jun 10 04:02:03 tf1 kernel: des 1 Jun 10 04:02:03 tf1 kernel: clvmd total nodes 1 Jun 10 04:02:03 tf1 kernel: lv1 rebuild resource directory Jun 10 04:02:03 tf1 kernel: clvmd rebuild resource directory Jun 10 04:02:03 tf1 kernel: clvmd rebuilt 0 resources Jun 10 04:02:03 tf1 kernel: clvmd purge requests Jun 10 04:02:03 tf1 kernel: clvmd purged 0 requests Jun 10 04:02:03 tf1 kernel: clvmd mark waiting requests Jun 10 04:02:03 tf1 kernel: clvmd marked 0 requests Jun 10 04:02:03 tf1 kernel: clvmd purge locks of departed nodes Jun 10 04:02:03 tf1 kernel: clvmd purged 0 locks Jun 10 04:02:03 tf1 kernel: clvmd update remastered resources Jun 10 04:02:03 tf1 kernel: clvmd updated 1 resources Jun 10 04:02:03 tf1 kernel: clvmd rebuild locks Jun 10 04:02:03 tf1 kernel: clvmd rebuilt 0 locks Jun 10 04:02:03 tf1 kernel: clvmd recover event 7 done Jun 10 04:02:03 tf1 kernel: clvmd move flags 0,0,1 ids 4,7,7 Jun 10 04:02:03 tf1 kernel: clvmd process held requests Jun 10 04:02:03 tf1 kernel: clvmd processed 0 requests Jun 10 04:02:03 tf1 kernel: clvmd resend marked requests Jun 10 04:02:03 tf1 kernel: clvmd resent 0 requests Jun 10 04:02:03 tf1 kernel: clvmd recover event 7 finished Jun 10 04:02:03 tf1 kernel: lv1 rebuilt 518 resources Jun 10 04:02:03 tf1 kernel: lv1 purge requests Jun 10 04:02:03 tf1 kernel: lv1 purged 0 requests Jun 10 04:02:03 tf1 kernel: lv1 mark waiting requests Jun 10 04:02:03 tf1 kernel: lv1 marked 0 requests Jun 10 04:02:03 tf1 kernel: lv1 purge locks of departed nodes Jun 10 04:02:03 tf1 kernel: lv1 purged 530 locks Jun 10 04:02:03 tf1 kernel: lv1 update remastered resources Jun 10 04:02:03 tf1 kernel: lv1 updated 20609 resources Jun 10 04:02:03 tf1 kernel: lv1 rebuild locks Jun 10 04:02:03 tf1 kernel: lv1 rebuilt 0 locks Jun 10 04:02:03 tf1 kernel: lv1 recover event 7 done Jun 10 04:02:03 tf1 kernel: lv1 move flags 0,0,1 ids 5,7,7 Jun 10 04:02:03 tf1 kernel: lv1 process held requests Jun 10 04:02:03 tf1 kernel: lv1 processed 0 requests Jun 10 04:02:03 tf1 kernel: lv1 resend marked requests Jun 10 04:02:03 tf1 kernel: lv1 resent 0 requests Jun 10 04:02:03 tf1 kernel: lv1 recover event 7 finished Jun 10 04:02:03 tf1 kernel: 6851 pr_start last_stop 0 last_start 6 last_finish 0 Jun 10 04:02:03 tf1 kernel: 6851 pr_start count 2 type 2 event 6 flags 250 Jun 10 04:02:03 tf1 kernel: 6851 claim_jid 1 Jun 10 04:02:03 tf1 kernel: 6851 pr_start 6 done 1 Jun 10 04:02:03 tf1 kernel: 6851 pr_finish flags 5a Jun 10 04:02:03 tf1 kernel: 6840 recovery_done jid 1 msg 309 a Jun 10 04:02:03 tf1 kernel: 6840 recovery_done nodeid 1 flg 18 Jun 10 04:02:03 tf1 kernel: 6851 pr_start last_stop 6 last_start 7 last_finish 6 Jun 10 04:02:03 tf1 kernel: 6851 pr_start count 1 type 1 event 7 flags 21a Jun 10 04:02:03 tf1 kernel: 6851 pr_start cb jid 0 id 2 Jun 10 04:02:03 tf1 kernel: 6851 pr_start 7 done 0 Jun 10 04:02:03 tf1 kernel: 6854 recovery_done jid 0 msg 309 11a Jun 10 04:02:03 tf1 kernel: 6854 recovery_done nodeid 2 flg 1b Jun 10 04:02:03 tf1 kernel: 6854 recovery_done start_done 7 Jun 10 04:02:03 tf1 kernel: 6850 pr_finish flags 1a Jun 10 04:02:03 tf1 kernel: Jun 10 04:02:03 tf1 kernel: Jun 10 04:02:03 tf1 kernel: lock_dlm: Assertion failed on line 428 of file /usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c Jun 10 04:02:03 tf1 kernel: lock_dlm: assertion: "!error" Jun 10 04:02:03 tf1 kernel: lock_dlm: time = 1252230568 Jun 10 04:02:03 tf1 kernel: lv1: num=3,11 err=-22 cur=-1 req=3 lkf=8 Jun 10 04:02:03 tf1 kernel: Jun 10 04:02:03 tf1 kernel: ------------[ cut here ]------------ Jun 10 04:02:03 tf1 kernel: kernel BUG at /usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c:428! Jun 10 04:02:03 tf1 kernel: invalid operand: 0000 [#1] Jun 10 04:02:03 tf1 kernel: SMP Jun 10 04:02:03 tf1 kernel: Modules linked in: nls_utf8 vfat fat usb_storage lock_dlm(U) dcdipm(U) dcdbas(U) parport_pc lp parport autofs4 i2c_dev i2c_core gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc button battery ac uhci_hcd ehci_hcd hw_random shpchp eepro100 e100 mii e1000 floppy sg ext3 jbd dm_mod aic7xxx megaraid_mbox megaraid_mm sd_mod scsi _mod Jun 10 04:02:03 tf1 kernel: CPU: 3 Jun 10 04:02:03 tf1 kernel: EIP: 0060:[<f8bc7779>] Tainted: P VLI Jun 10 04:02:03 tf1 kernel: EFLAGS: 00010246 (2.6.9-34.ELsmp) Jun 10 04:02:03 tf1 kernel: EIP is at do_dlm_lock+0x134/0x14e [lock_dlm] Jun 10 04:02:03 tf1 kernel: eax: 00000001 ebx: ffffffea ecx: c585ace8 edx: f8bcc15f Jun 10 04:02:03 tf1 kernel: esi: f8bc7798 edi: f77c8400 ebp: c2361600 esp: c585ace4 Jun 10 04:02:03 tf1 kernel: ds: 007b es: 007b ss: 0068 Jun 10 04:02:03 tf1 kernel: Process df (pid: 15930, threadinfo=c585a000 task=d94fa6b0) Jun 10 04:02:03 tf1 kernel: Stack: f8bcc15f 20202020 33202020 20202020 20202020 20202020 31312020 00000018 Jun 10 04:02:03 tf1 kernel: d2956694 c2361600 00000003 00000000 c2361600 f8bc7828 00000003 f8bcf860 Jun 10 04:02:03 tf1 kernel: f8ba0000 f8bf45b2 00000000 00000001 f4fd2064 f4fd2048 f8ba0000 f8bea5cd Jun 10 04:02:03 tf1 kernel: Call Trace: Jun 10 04:02:03 tf1 kernel: [<f8bc7828>] lm_dlm_lock+0x49/0x52 [lock_dlm] Jun 10 04:02:03 tf1 kernel: [<f8bf45b2>] gfs_lm_lock+0x35/0x4d [gfs] Jun 10 04:02:03 tf1 kernel: [<f8bea5cd>] gfs_glock_xmote_th+0x130/0x172 [gfs] Jun 10 04:02:03 tf1 kernel: [<f8be9c91>] rq_promote+0xc8/0x147 [gfs] Jun 10 04:02:03 tf1 kernel: [<f8be9e7d>] run_queue+0x91/0xc1 [gfs] Jun 10 04:02:03 tf1 kernel: [<f8beae88>] gfs_glock_nq+0xcf/0x116 [gfs] Jun 10 04:02:03 tf1 kernel: [<f8beb40f>] gfs_glock_nq_init+0x13/0x26 [gfs] Jun 10 04:02:03 tf1 kernel: [<f8c0b6d6>] stat_gfs_async+0x119/0x187 [gfs] Jun 10 04:02:03 tf1 kernel: [<f8c0b80b>] gfs_stat_gfs+0x27/0x4e [gfs] Jun 10 04:02:03 tf1 kernel: [<c01aa436>] superblock_has_perm+0x1f/0x23 Jun 10 04:02:03 tf1 kernel: [<f8c0387e>] gfs_statfs+0x26/0xc7 [gfs] Jun 10 04:02:03 tf1 kernel: [<c0158675>] vfs_statfs+0x41/0x59 Jun 10 04:02:03 tf1 kernel: [<c015876b>] vfs_statfs64+0xe/0x28 Jun 10 04:02:03 tf1 kernel: [<c0166d75>] __user_walk+0x4a/0x51 Jun 10 04:02:03 tf1 kernel: [<c0158876>] sys_statfs64+0x52/0xb2 Jun 10 04:02:03 tf1 kernel: [<c014f598>] do_mmap_pgoff+0x568/0x666 Jun 10 04:02:03 tf1 kernel: [<c010b693>] sys_mmap2+0x7e/0xaf Jun 10 04:02:03 tf1 kernel: [<c011ad21>] do_page_fault+0x0/0x5c6 Jun 10 04:02:03 tf1 kernel: [<c02d2657>] syscall_call+0x7/0xb Jun 10 04:02:03 tf1 kernel: Code: 26 50 0f bf 45 24 50 53 ff 75 08 ff 75 04 ff 75 0c ff 77 18 68 8a c2 bc f8 e8 ce ae 55 c7 83 c4 38 68 5f c1 bc f8 e8 c1 ae 55 c7 <0f> 0b ac 01 a7 c0 bc f8 68 61 c1 bc f8 e8 7c a6 55 c7 83 c4 20 Jun 10 04:02:03 tf1 kernel: <0>Fatal exception: panic in 5 seconds Jun 10 04:02:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45210 seconds. Jun 16 10:48:47 tf1 syslogd 1.4.1: restart. ----- End forwarded message ----- -- ================================================ | Jason Welsh jason@xxxxxxxxxxxxxx | | http://monsterjam.org DSS PGP: 0x5E30CC98 | | gpg key: http://monsterjam.org/gpg/ | ================================================ -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster