Здравствуйте, Linux-cluster.
My gfs mountpoints in cluster periodically (approximately, once per 2 weeks) hangs, and in my logs i see this:
Jun 30 23:16:26 cluster kernel: grsec: From 87.245.147.2: denied resource overstep by requesting 100339712 for RLIMIT_STACK against limit 4194304 for /[cman_t
ool:13085] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:11531] uid/euid:0/0 gid/egid:0/0
Jun 30 23:16:26 cluster kernel: grsec: From 87.245.147.2: denied resource overstep by requesting 100339712 for RLIMIT_STACK against limit 4194304 for /[cman_t
ool:13085] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:11531] uid/euid:0/0 gid/egid:0/0
Jun 30 23:16:26 cluster kernel: CMAN: Waiting to join or form a Linux-cluster
Jun 30 23:16:30 cluster kernel: CMAN: sending membership request
Jun 30 23:16:31 cluster kernel: CMAN: got node node0
Jun 30 23:16:31 cluster kernel: CMAN: got node node1
Jun 30 23:17:01 cluster kernel: CMAN: Master died after JOINCONF, we must leave the cluster
Jun 30 23:17:01 cluster kernel: CMAN: we are leaving the cluster.
Jun 30 23:18:04 cluster kernel: grsec: From 87.245.147.2: denied resource overstep by requesting 111812608 for RLIMIT_STACK against limit 8388608 for /[cman_t
ool:16413] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:11531] uid/euid:0/0 gid/egid:0/0
Jun 30 23:18:04 cluster kernel: grsec: From 87.245.147.2: denied resource overstep by requesting 111812608 for RLIMIT_STACK against limit 8388608 for /[cman_t
ool:16413] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:11531] uid/euid:0/0 gid/egid:0/0
Jun 30 23:18:04 cluster kernel: CMAN: Waiting to join or form a Linux-cluster
Jun 30 23:18:05 cluster kernel: CMAN: sending membership request
Jun 30 23:18:06 cluster kernel: CMAN: got node node0
Jun 30 23:18:06 cluster kernel: CMAN: got node node1
Jun 30 23:18:36 cluster kernel: CMAN: Master died after JOINCONF, we must leave the cluster
Jun 30 23:18:36 cluster kernel: CMAN: we are leaving the cluster.
Jun 30 23:19:05 cluster kernel: CMAN: Waiting to join or form a Linux-cluster
Jun 30 23:19:05 cluster kernel: CMAN: sending membership request
Jun 30 23:19:06 cluster kernel: CMAN: got node node1
Jun 30 23:19:06 cluster kernel: CMAN: got node node0
Jun 30 23:19:27 cluster kernel: CMAN: node node0 has been removed from the cluster : Inconsistent cluster view
Jun 30 23:22:39 cluster kernel: CMAN: removing node node1 from the cluster : No response to messages
Jun 30 23:22:39 cluster kernel: ------------[ cut here ]------------
Jun 30 23:22:39 cluster kernel: kernel BUG at /home/Compile/GFS/cluster-1.02.00/cman-kernel/src/membership.c:3151!
Jun 30 23:22:39 cluster kernel: invalid opcode: 0000 [#1]
Jun 30 23:22:39 cluster kernel: Modules linked in: nfs lock_dlm dlm cman lock_harness nfsd exportfs lockd nfs_acl sunrpc ipt_REJECT ipt_multiport iptable_nat
ip_nat ip_conntrack iptable_filter lm75 microcode dm_mod button battery ac uhci_hcd ehci_hcd i2c_i801 e1000 ext3 jbd 3w_xxxx
Jun 30 23:22:39 cluster kernel: CPU: 0
Jun 30 23:22:39 cluster kernel: EIP: 0060:[<f8aa95e6>] Tainted: GF VLI
Jun 30 23:22:39 cluster kernel: EFLAGS: 00010246 (2.6.16.20-grsec #8)
Jun 30 23:22:39 cluster kernel: eax: 00000000 ebx: 00000080 ecx: f8ab9000 edx: 00000080
Jun 30 23:22:39 cluster kernel: esi: d3352f64 edi: d3352fa0 ebp: 00000000 esp: d3352f58
Jun 30 23:22:39 cluster kernel: ds: 007b es: 007b ss: 0068
Jun 30 23:22:39 cluster kernel: Process cman_memb (pid: 7952, threadinfo=d3352000 task=c530e2b0)
Jun 30 23:22:39 cluster kernel: Stack: <0>f2b45920 f8aa12bc f8aaa9f9 f5458dc0 f8aa0712 00000001 f2b45920 f8aaaa9d
Jun 30 23:22:39 cluster kernel: c530e2b0 f8aa12e5 f8aad021 00000000 00000000 00000000 c530e2b0 c01473ba
Jun 30 23:22:39 cluster kernel: 00100100 00200200 0100001e 00000001 c01473ba 00100100 00200200 00000001
Jun 30 23:22:39 cluster kernel: Call Trace:
Jun 30 23:22:39 cluster kernel: [<f8aaa9f9>]
Jun 30 23:22:39 cluster kernel: [<f8aaaa9d>]
Jun 30 23:22:39 cluster kernel: [<f8aad021>]
Jun 30 23:22:39 cluster kernel: [<c01473ba>]
Jun 30 23:22:39 cluster kernel: [<c01473ba>]
Jun 30 23:22:39 cluster kernel: [<f8aac631>]
Jun 30 23:22:39 cluster kernel: [<c0131005>]
Jun 30 23:22:39 cluster kernel: Code: 1d f8 15 aa f8 8b 0d f4 15 aa f8 ba 01 00 00 00 eb 15 8b 04 91 85 c0 74 0d 83 78 1c 02 75 07 89 06 8b 40 14 eb 0f 42 39
da 7c e7 <0f> 0b 4f 0c 93 38 ab f8 31 c0 5b 5e c3 a3 3c 22 aa f8 b8 cc 15
------------------
And another one:
Jul 10 12:48:46 cluster kernel: grsec: From 83.166.231.248: denied resource overstep by requesting 57942016 for RLIMIT_STACK against limit 4194304 for /[cman_tool:11938] uid/guid:0/0 gid/egid:0/0, parent /bin/bash[bash:4524] uid/euid:0/0 gid/egid:0/0
Jul 10 12:48:46 cluster kernel: grsec: From 83.166.231.248: denied resource overstep by requesting 57942016 for RLIMIT_STACK against limit 4194304 for /[cman_tool:11938] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:4524] uid/euid:0/0 gid/egid:0/0
Jul 10 12:48:46 cluster kernel: CMAN: Waiting to join or form a Linux-cluster
Jul 10 12:48:48 cluster kernel: CMAN: sending membership request
Jul 10 12:48:48 cluster kernel: CMAN: sending membership request
Jul 10 12:48:48 cluster kernel: CMAN: got node node1
Jul 10 12:53:42 cluster kernel: CMAN: removing node node1 from the cluster : No response to messages
Jul 10 12:53:42 cluster kernel: ------------[ cut here ]------------
Jul 10 12:53:42 cluster kernel: kernel BUG at /home/Compile/GFS/cluster-1.02.00/cman-kernel/src/membership.c:3151!
Jul 10 12:53:42 cluster kernel: invalid opcode: 0000 [#1]
Jul 10 12:53:42 cluster kernel: Modules linked in: nfs gnbd lock_dlm dlm cman lock_harness nfsd exportfs lockd nfs_acl sunrpc ipt_REJECT ipt_multiport iptable
_nat ip_nat ip_conntrack iptable_filter lm75 microcode dm_mod button battery ac uhci_hcd ehci_hcd i2c_i801 e1000 ext3 jbd 3w_xxxx
Jul 10 12:53:42 cluster kernel: CPU: 0
Jul 10 12:53:42 cluster kernel: EIP: 0060:[<f8aa95e6>] Tainted: GF VLI
Jul 10 12:53:42 cluster kernel: EFLAGS: 00010246 (2.6.16.20-grsec #8)
Jul 10 12:53:42 cluster kernel: eax: 00000000 ebx: 00000080 ecx: f8ab9000 edx: 00000080
Jul 10 12:53:42 cluster kernel: esi: c0722f64 edi: c0722fa0 ebp: 00000000 esp: c0722f58
Jul 10 12:53:42 cluster kernel: ds: 007b es: 007b ss: 0068
Jul 10 12:53:42 cluster kernel: Process cman_memb (pid: 31173, threadinfo=c0722000 task=d41e8910)
Jul 10 12:53:42 cluster kernel: Stack: <0>f6e45bc0 f8aa12bc f8aaa9f9 f6e630c0 f8aa0712 00000003 f6e45bc0 f8aaaa9d
Jul 10 12:53:42 cluster kernel: d41e8910 f8aa12e5 f8aad021 00000000 00000000 00000000 d41e8910 c01473ba
Jul 10 12:53:42 cluster kernel: 00100100 00200200 0100001e 00000003 c01473ba 00100100 00200200 00000001
Jul 10 12:53:42 cluster kernel: Call Trace:
Jul 10 12:53:42 cluster kernel: [<f8aaa9f9>]
Jul 10 12:53:42 cluster kernel: [<f8aaaa9d>]
Jul 10 12:53:42 cluster kernel: [<f8aad021>]
Jul 10 12:53:42 cluster kernel: [<c01473ba>]
Jul 10 12:53:42 cluster kernel: [<c01473ba>]
Jul 10 12:53:42 cluster kernel: [<f8aac631>]
Jul 10 12:53:42 cluster kernel: [<c0131005>]
Jul 10 12:53:42 cluster kernel: Code: 1d f8 15 aa f8 8b 0d f4 15 aa f8 ba 01 00 00 00 eb 15 8b 04 91 85 c0 74 0d 83 78 1c 02 75 07 89 06 8b 40 14 eb 0f 42 39
da 7c e7 <0f> 0b 4f 0c 93 38 ab f8 31 c0 5b 5e c3 a3 3c 22 aa f8 b8 cc 15
Jul 10 13:03:02 cluster kernel: releasing gnbd class
Jul 10 13:03:02 cluster kernel: releasing gnbd class
Jul 10 13:03:05 cluster last message repeated 126 times
Actually, all requests to GFS moutpoint gets hang forever to wait something, and all 100% CPU time passeed to wait state.
At that time servers with imported GNBD`s does not go to soft reboot or shutdown anyway. Only hard reset/poweroff helps.
The dump i provide is from main cluster node that hosts hard disks with partition that i shared over GNBD with GFS.
BTW, my kernel patched with grsecurity patch (as you can see at top of provided logs).
what is a solution? What for cman_tool require a stack size over 50Mb and over 100Mb???
--
С уважением,
Flagman mailto:Flagman@xxxxxxxxxxx
-- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster