I'm running FC4 (2.6.13-1.1532_FC4smp), dlm-1.0.0-3 and GFS-6.1.0-3. I have a 3 node cluster. The df command has always been very slow to return output on my gfs mounted filesystems. Series of events...
16:20:00 - node01 was out of the cluster, node02 and node03 were active with 2 gfs filesystems mounted
16:22:10 - after joining the cluster, both filesystems were successfully mounted
16:22:37 - a df command was attempted by a monitoring script
16:22:54 - I executed /etc/init.d/gfs stop and it failed because 1 of the filesystems was busy and could not be umounted (the above df command may have been the cause, it ended up hanging)
16:22:55 - node02 and node03 panicked and were not properly fenced
log messages from node02 at the time of the panic
Dec 13 16:22:55 node02 kernel: event 22 done
Dec 13 16:22:55 node02 kernel: gfs01 move flags 0,0,1 ids 15,22,22
Dec 13 16:22:55 node02 kernel: gfs01 process held requests
Dec 13 16:22:55 node02 kernel: gfs01 processed 0 requests
Dec 13 16:22:55 node02 kernel: gfs01 resend marked requests
Dec 13 16:22:55 node02 kernel: gfs01 resent 0 requests
Dec 13 16:22:55 node02 kernel: gfs01 recover event 22 finished
Dec 13 16:22:55 node02 kernel: gfs00 move flags 1,0,0 ids 20,20,20
Dec 13 16:22:55 node02 kernel: gfs00 move flags 0,1,0 ids 20,25,20
Dec 13 16:22:55 node02 kernel: gfs00 move use event 25
Dec 13 16:22:55 node02 kernel: gfs00 recover event 25
Dec 13 16:22:55 node02 kernel: gfs00 remove node 1
Dec 13 16:22:55 node02 kernel: gfs00 total nodes 2
Dec 13 16:22:55 node02 kernel: event 22 done
Dec 13 16:22:55 node02 kernel: gfs01 move flags 0,0,1 ids 15,22,22
Dec 13 16:22:55 node02 kernel: gfs01 process held requests
Dec 13 16:22:55 node02 kernel: gfs01 processed 0 requests
Dec 13 16:22:55 node02 kernel: gfs01 resend marked requests
Dec 13 16:22:55 node02 kernel: gfs01 resent 0 requests
Dec 13 16:22:55 node02 kernel: gfs01 recover event 22 finished
Dec 13 16:22:55 node02 kernel: gfs00 move flags 1,0,0 ids 20,20,20
Dec 13 16:22:55 node02 kernel: gfs00 move flags 0,1,0 ids 20,25,20
Dec 13 16:22:55 node02 kernel: gfs00 move use event 25
Dec 13 16:22:55 node02 kernel: gfs00 recover event 25
Dec 13 16:22:55 node02 kernel: gfs00 remove node 1
Dec 13 16:22:55 node02 kernel: gfs00 total nodes 2
Dec 13 16:22:55 node02 kernel: gfs00 rebuild resource directory
Dec 13 16:22:55 node02 kernel: gfs00 rebuilt 1913 resources
Dec 13 16:22:55 node02 kernel: event 22 done
Dec 13 16:22:55 node02 kernel: gfs01 move flags 0,0,1 ids 15,22,22
Dec 13 16:22:55 node02 kernel: gfs01 process held requests
Dec 13 16:22:55 node02 kernel: gfs01 processed 0 requests
Dec 13 16:22:55 node02 kernel: gfs01 resend marked requests
Dec 13 16:22:55 node02 kernel: gfs01 resent 0 requests
Dec 13 16:22:55 node02 kernel: gfs01 recover event 22 finished
Dec 13 16:22:55 node02 kernel: gfs00 move flags 1,0,0 ids 20,20,20
Dec 13 16:22:55 node02 kernel: gfs00 move flags 0,1,0 ids 20,25,20
Dec 13 16:22:55 node02 kernel: gfs00 move use event 25
Dec 13 16:22:55 node02 kernel: gfs00 recover event 25
Dec 13 16:22:55 node02 kernel: gfs00 remove node 1
Dec 13 16:22:55 node02 kernel: gfs00 total nodes 2
Dec 13 16:22:55 node02 kernel: gfs00 rebuild resource directory
Dec 13 16:22:55 node02 kernel: gfs00 rebuilt 1913 resources
Dec 13 16:22:55 node02 kernel: gfs00 purge requests
Dec 13 16:22:55 node02 kernel: gfs00 purged 0 requests
Dec 13 16:22:55 node02 kernel: gfs00 mark waiting requests
Dec 13 16:22:55 node02 kernel: gfs00 mark 2900192 lq 4 nodeid 1
Dec 13 16:22:55 node02 kernel: gfs00 mark 2900192 unlock no rep
Dec 13 16:22:55 node02 kernel: gfs00 marked 1 requests
Dec 13 16:22:55 node02 kernel: gfs00 purge locks of departed nodes
Dec 13 16:22:55 node02 kernel: gfs00 purged 1 locks
Dec 13 16:22:55 node02 kernel: gfs00 update remastered resources
Dec 13 16:22:55 node02 kernel: gfs00 updated 1 resources
Dec 13 16:22:55 node02 kernel: gfs00 rebuild locks
Dec 13 16:22:55 node02 kernel: gfs00 rebuilt 0 locks
Dec 13 16:22:55 node02 kernel: gfs00 recover event 25 done
Dec 13 16:22:55 node02 kernel: gfs00 move flags 0,0,1 ids 20,25,25
Dec 13 16:22:55 node02 kernel: gfs00 process held requests
Dec 13 16:22:55 node02 kernel: gfs00 processed 0 requests
Dec 13 16:22:55 node02 kernel: gfs00 resend marked requests
Dec 13 16:22:55 node02 kernel: gfs00 resend 2900192 lq 4 flg 3080000 node 2/2 "withdraw 1"
Dec 13 16:22:55 node02 kernel: gfs00 unlock done 2900192
Dec 13 16:22:55 node02 kernel: gfs00 resent 1 requests
Dec 13 16:22:55 node02 kernel: gfs00 recover event 25 finished
Dec 13 16:22:55 node02 kernel:
Dec 13 16:22:55 node02 kernel: DLM: Assertion failed on line 1007 of file /usr/src/build/627959-i686/BUILD/smp/src/lockqueue.c
Dec 13 16:22:55 node02 kernel: DLM: assertion: "lkb"
Dec 13 16:22:56 node02 kernel: DLM: time = 6642223
Dec 13 16:22:56 node02 kernel: dlm: reply
Dec 13 16:22:56 node02 kernel: rh_cmd 5
Dec 13 16:22:56 node02 kernel: rh_lkid 2900192
Dec 13 16:22:56 node02 kernel: lockstate 4137259392
Dec 13 16:22:56 node02 kernel: nodeid 3224043367
Dec 13 16:22:56 node02 kernel: status 4294901758
Dec 13 16:22:56 node02 kernel: lkid 4040
Dec 13 16:22:56 node02 kernel: nodeid 1
Dec 13 16:22:56 node02 kernel:
Dec 13 16:22:56 node02 kernel: ------------[ cut here ]------------
Dec 13 16:22:56 node02 kernel: kernel BUG at /usr/src/build/627959-i686/BUILD/smp/src/lockqueue.c:1007!
Dec 13 16:22:56 node02 kernel: invalid operand: 0000 [#1]
Dec 13 16:22:56 node02 kernel: SMP
Dec 13 16:22:56 node02 kernel: Modules linked in: autofs4 i2c_dev i2c_core lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) ipv6 crc32c libcrc32c iscsi_sfnet(U) scsi_transport_iscsi dm_mod video button battery ac uhci_hcd ehci_hcd shpchp e100 mii e1000 floppy sg ext3 jbd megaraid_mbox megaraid_mm sd_mod scsi_mod
Dec 13 16:22:56 node02 kernel: CPU: 3
Dec 13 16:22:56 node02 kernel: EIP: 0060:[<f8b66d09>] Tainted: GF VLI
Dec 13 16:22:56 node02 kernel: EFLAGS: 00010292 (2.6.13-1.1532_FC4smp)
Dec 13 16:22:56 node02 kernel: EIP is at process_cluster_request+0x9b9/0xdfa [dlm]
Dec 13 16:22:56 node02 kernel: eax: 00000004 ebx: 00000000 ecx: c036fc2c edx: 00000286
Dec 13 16:22:56 node02 kernel: esi: f6d35200 edi: 00000000 ebp: f6035ed4 esp: f6035e24
Dec 13 16:22:56 node02 kernel: ds: 007b es: 007b ss: 0068
Dec 13 16:22:56 node02 kernel: Process dlm_recvd (pid: 2939, threadinfo=f6035000 task=f7916020)
Dec 13 16:22:56 node02 kernel: Stack: badc0ded f8b73a44 00000001 f8b74a9c f8b73a40 00655a2f 00000001 00000040
Dec 13 16:22:56 node02 kernel: 00004000 f6035e48 00000000 c039f100 00001000 f3c4bb80 c02aff67 00001000
Dec 13 16:22:56 node02 kernel: 00004040 00000000 f8b6f617 00000000 00000001 ffffffff 00000000 f7ef84bc
Dec 13 16:22:56 node02 kernel: Call Trace:
Dec 13 16:22:56 node02 kernel: [<c02aff67>] sock_recvmsg+0x103/0x11e
Dec 13 16:22:56 node02 kernel: [<f8b6f617>] process_reply_async+0x1d/0x23 [dlm]
Dec 13 16:22:56 node02 kernel: [<f8b6a6d1>] copy_from_cb+0x25/0x5d [dlm]
Dec 13 16:22:56 node02 kernel: [<f8b6a95b>] midcomms_process_incoming_buffer+0x13b/0x25f [dlm]
Dec 13 16:22:56 node02 kernel: [<c02aff67>] sock_recvmsg+0x103/0x11e
Dec 13 16:22:56 node02 kernel: [<f8b6880f>] receive_from_sock+0x19b/0x2ce [dlm]
Dec 13 16:22:56 node02 kernel: [<c03166e3>] schedule+0x563/0xb8e
Dec 13 16:22:56 node02 kernel: [<c0105f15>] do_IRQ+0x55/0x86
Dec 13 16:22:56 node02 kernel: [<f8b69949>] dlm_recvd+0x0/0xa1 [dlm]
Dec 13 16:22:56 node02 kernel: [<f8b69777>] process_sockets+0x80/0xda [dlm]
Dec 13 16:22:56 node02 kernel: [<f8b699b9>] dlm_recvd+0x70/0xa1 [dlm]
Dec 13 16:22:56 node02 kernel: [<c01343d9>] kthread+0x93/0x97
Dec 13 16:22:56 node02 kernel: [<c0134346>] kthread+0x0/0x97
Dec 13 16:22:56 node02 kernel: [<c0101ca1>] kernel_thread_helper+0x5/0xb
Dec 13 16:22:56 node02 kernel: Code: 65 a9 5b c7 89 e8 e8 6a bd 00 00 8b 54 24 14 89 54 24 04 c7 04 24 96 3b b7 f8 e8 4a a9 5b c7 c7 04 24 44 3a b7 f8 e8 3e a9 5b c7 <0f> 0b ef 03 9c 4a b7 f8 c7 04 24 2c 4b b7 f8 e8 76 9f 5b c7 e8
Dec 13 16:22:56 node02 kernel: <0>Fatal exception: panic in 5 seconds
Any help would be greatly appreciated.
- Jeff
-- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster