Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 1760 Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: sending reply 40000007 to fd 17 Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17 Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: client command is 800000b7 Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: About to process command Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] memb: command to process is 800000b7 Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] memb: command return code is 0 Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 0 Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: sending reply c00000b7 to fd 17 Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] memb: quorum_device_timer_fn Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17 Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: client command is 7 Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: About to process command Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] memb: command to process is 7 Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] memb: get_all_members: allocated new buffer (retsize=1024) Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] memb: get_all_members: retlen = 1760 Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] memb: command return code is 4 Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 1760 Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: sending reply 40000007 to fd 17 Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17 Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: client command is 800000b7 Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: About to process command Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] memb: command to process is 800000b7 Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] memb: command return code is 0 Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 0 Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: sending reply c00000b7 to fd 17 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: client command is 7 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: About to process command Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] memb: command to process is 7 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] memb: get_all_members: allocated new buffer (retsize=1024) Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] memb: get_all_members: retlen = 1760 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] memb: command return code is 4 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 1760 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: sending reply 40000007 to fd 17 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: client command is 800000b7 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: About to process command Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] memb: command to process is 800000b7 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] memb: command return code is 0 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 0 Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: sending reply c00000b7 to fd 17 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: client command is 7 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: About to process command Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] memb: command to process is 7 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] memb: get_all_members: allocated new buffer (retsize=1024) Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] memb: get_all_members: retlen = 1760 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] memb: command return code is 4 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 1760 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: sending reply 40000007 to fd 17 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: client command is 800000b7 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: About to process command Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] memb: command to process is 800000b7 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] memb: command return code is 0 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 0 Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: sending reply c00000b7 to fd 17 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: client command is 7 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: About to process command Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] memb: command to process is 7 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] memb: get_all_members: allocated new buffer (retsize=1024) Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] memb: get_all_members: retlen = 1760 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] memb: command return code is 4 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 1760 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: sending reply 40000007 to fd 17 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: client command is 800000b7 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: About to process command Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] memb: command to process is 800000b7 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] memb: command return code is 0 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 0 Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: sending reply c00000b7 to fd 17 Aug 19 00:10:01 vm02-test kernel: [282120.240067] INFO: task kworker/1:0:3117 blocked for more than 120 seconds. Aug 19 00:10:01 vm02-test kernel: [282120.240182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 19 00:10:01 vm02-test kernel: [282120.240296] kworker/1:0 D ffff88032fc93900 0 3117 2 0x00000000 Aug 19 00:10:01 vm02-test kernel: [282120.240302] ffff8802bb4dfb30 0000000000000046 ffff88031e4744d0 ffff8802bb4dffd8 Aug 19 00:10:01 vm02-test kernel: [282120.240307] ffff8802bb4dffd8 ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0 Aug 19 00:10:01 vm02-test kernel: [282120.240311] 0000000000000286 ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002
Aug 19 00:10:01 vm02-test kernel: [282120.240316] Call Trace:Aug 19 00:10:01 vm02-test kernel: [282120.240334] [<ffffffffa0570290>] ? gfs2_glock_demote_wait+0x20/0x20 [gfs2] Aug 19 00:10:01 vm02-test kernel: [282120.240340] [<ffffffff81666c89>] schedule+0x29/0x70 Aug 19 00:10:01 vm02-test kernel: [282120.240349] [<ffffffffa057029e>] gfs2_glock_holder_wait+0xe/0x20 [gfs2] Aug 19 00:10:01 vm02-test kernel: [282120.240352] [<ffffffff81665400>] __wait_on_bit+0x60/0x90 Aug 19 00:10:01 vm02-test kernel: [282120.240361] [<ffffffffa0570290>] ? gfs2_glock_demote_wait+0x20/0x20 [gfs2] Aug 19 00:10:01 vm02-test kernel: [282120.240364] [<ffffffff816654ac>] out_of_line_wait_on_bit+0x7c/0x90 Aug 19 00:10:01 vm02-test kernel: [282120.240369] [<ffffffff81073400>] ? autoremove_wake_function+0x40/0x40 Aug 19 00:10:01 vm02-test kernel: [282120.240378] [<ffffffffa05713a7>] wait_on_holder+0x47/0x80 [gfs2] Aug 19 00:10:01 vm02-test kernel: [282120.240388] [<ffffffffa05741d8>] gfs2_glock_nq+0x328/0x450 [gfs2] Aug 19 00:10:01 vm02-test kernel: [282120.240399] [<ffffffffa058a8ca>] gfs2_check_blk_type+0x4a/0x150 [gfs2] Aug 19 00:10:01 vm02-test kernel: [282120.240410] [<ffffffffa058a8c1>] ? gfs2_check_blk_type+0x41/0x150 [gfs2] Aug 19 00:10:01 vm02-test kernel: [282120.240421] [<ffffffffa058ba0c>] gfs2_evict_inode+0x2cc/0x360 [gfs2] Aug 19 00:10:01 vm02-test kernel: [282120.240432] [<ffffffffa058b842>] ? gfs2_evict_inode+0x102/0x360 [gfs2] Aug 19 00:10:01 vm02-test kernel: [282120.240437] [<ffffffff811940c2>] evict+0xb2/0x1b0 Aug 19 00:10:01 vm02-test kernel: [282120.240440] [<ffffffff811942c9>] iput+0x109/0x210 Aug 19 00:10:01 vm02-test kernel: [282120.240448] [<ffffffffa0572fdc>] delete_work_func+0x5c/0x90 [gfs2] Aug 19 00:10:01 vm02-test kernel: [282120.240453] [<ffffffff8106d5fa>] process_one_work+0x12a/0x420 Aug 19 00:10:01 vm02-test kernel: [282120.240462] [<ffffffffa0572f80>] ? gfs2_holder_uninit+0x40/0x40 [gfs2] Aug 19 00:10:01 vm02-test kernel: [282120.240465] [<ffffffff8106e19e>] worker_thread+0x12e/0x2f0 Aug 19 00:10:01 vm02-test kernel: [282120.240469] [<ffffffff8106e070>] ? manage_workers.isra.25+0x200/0x200 Aug 19 00:10:01 vm02-test kernel: [282120.240472] [<ffffffff81072e73>] kthread+0x93/0xa0 Aug 19 00:10:01 vm02-test kernel: [282120.240477] [<ffffffff816710a4>] kernel_thread_helper+0x4/0x10 Aug 19 00:10:01 vm02-test kernel: [282120.240480] [<ffffffff81072de0>] ? flush_kthread_worker+0x80/0x80 Aug 19 00:10:01 vm02-test kernel: [282120.240484] [<ffffffff816710a0>] ? gs_change+0x13/0x13 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: client command is 7 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: About to process command Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] memb: command to process is 7 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] memb: get_all_members: allocated new buffer (retsize=1024) Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] memb: get_all_members: retlen = 1760 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] memb: command return code is 4 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 1760 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: sending reply 40000007 to fd 17 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: client command is 800000b7 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: About to process command Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] memb: command to process is 800000b7 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] memb: command return code is 0 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 0 Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: sending reply c00000b7 to fd 17 Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17 Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] daemon: client command is 7 Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] daemon: About to process command Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] memb: command to process is 7 Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] memb: get_all_members: allocated new buffer (retsize=1024) Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] memb: get_all_members: retlen = 1760 Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] memb: command return code is 4 Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] daemon: Returning command data. length = 1760 Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] daemon: sending reply 40000007 to fd 17 Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] daemon: read 20 bytes from fd 17
Does this help in any way? Kind regards, Bart Digimer schreef op 19.08.2012 22:24:
The config looks fine. Sorry, I should have specified syslog; /var/log/messages. On 08/19/2012 03:05 PM, Bart Verwilst wrote:<cluster name="kvm" config_version="13"> <logging debug="on"/> <clusternodes> <clusternode name="vm01-test" nodeid="1"> <fence> <method name="apc"> <device name="apc01" port="1" action="off"/> <device name="apc02" port="1" action="off"/> <device name="apc01" port="1" action="on"/> <device name="apc02" port="1" action="on"/> </method> </fence> </clusternode> <clusternode name="vm02-test" nodeid="2"> <fence> <method name="apc"> <device name="apc01" port="8" action="off"/> <device name="apc02" port="8" action="off"/> <device name="apc01" port="8" action="on"/> <device name="apc02" port="8" action="on"/> </method> </fence> </clusternode> <clusternode name="vm03-test" nodeid="3"> <fence> <method name="apc"> <device name="apc01" port="2" action="off"/> <device name="apc02" port="2" action="off"/> <device name="apc01" port="2" action="on"/> <device name="apc02" port="2" action="on"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice agent="fence_apc" ipaddr="apc01" secure="on" login="device" name="apc01" passwd="xxx"/> <fencedevice agent="fence_apc" ipaddr="apc02" secure="on" login="device" name="apc02" passwd="xxx"/> </fencedevices> <rm log_level="5"> <failoverdomains><failoverdomain name="any_node" nofailback="1" ordered="0"restricted="0"/> </failoverdomains> <vm domain="any_node" max_restarts="2" migrate="live" name="firewall" path="/etc/libvirt/qemu/" recovery="restart" restart_expire_time="600"/> <vm domain="any_node" max_restarts="2" migrate="live" name="zabbix" path="/etc/libvirt/qemu/" recovery="restart" restart_expire_time="600"/> </rm> <totem rrp_mode="none" secauth="off"/> <quorumd device="/dev/mapper/iscsi_cluster_quorum"></quorumd> </cluster> dlm_control.log*, fenced.log, corosync.log are either empty or just contain entries until Aug 15 or ( in case of fenced.log ) "Aug 18 23:41:46 fenced logging mode 3 syslog f 160 p 6 logfile p 7/var/log/cluster/fenced.log". I was expecting at least some log output, since everything seems to work fine. Maybe enabling more debugging mightbring something up ( but debugging is already on @ cluster.conf ) Kind regards, Bart Digimer schreef op 19.08.2012 17:45:On 08/19/2012 05:52 AM, Bart Verwilst wrote:Hi,I have a 3-node cluster in testing which seem to work quite well ( cman,rgmanager, gfs2, etc ).On (only) one of my nodes, yesterday i noticed the message below indmesg. I saw this 30 minutes after the facts. I could browse both my gfs2 mounts, there was no fencing or anything on any node. Any idea what might have caused this, and then go away? Aug 19 00:10:01 vm02-test kernel: [282120.240067] INFO: task kworker/1:0:3117 blocked for more than 120 seconds. Aug 19 00:10:01 vm02-test kernel: [282120.240182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Aug 19 00:10:01 vm02-test kernel: [282120.240296] kworker/1:0 Dffff88032fc93900 0 3117 2 0x00000000Aug 19 00:10:01 vm02-test kernel: [282120.240302] ffff8802bb4dfb300000000000000046 ffff88031e4744d0 ffff8802bb4dffd8Aug 19 00:10:01 vm02-test kernel: [282120.240307] ffff8802bb4dffd8ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0Aug 19 00:10:01 vm02-test kernel: [282120.240311] 0000000000000286ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002 Aug 19 00:10:01 vm02-test kernel: [282120.240316] Call Trace:Aug 19 00:10:01 vm02-test kernel: [282120.240334] [<ffffffffa0570290>]? gfs2_glock_demote_wait+0x20/0x20 [gfs2]Aug 19 00:10:01 vm02-test kernel: [282120.240340] [<ffffffff81666c89>]schedule+0x29/0x70Aug 19 00:10:01 vm02-test kernel: [282120.240349] [<ffffffffa057029e>]gfs2_glock_holder_wait+0xe/0x20 [gfs2]Aug 19 00:10:01 vm02-test kernel: [282120.240352] [<ffffffff81665400>]__wait_on_bit+0x60/0x90Aug 19 00:10:01 vm02-test kernel: [282120.240361] [<ffffffffa0570290>]? gfs2_glock_demote_wait+0x20/0x20 [gfs2]Aug 19 00:10:01 vm02-test kernel: [282120.240364] [<ffffffff816654ac>]out_of_line_wait_on_bit+0x7c/0x90Aug 19 00:10:01 vm02-test kernel: [282120.240369] [<ffffffff81073400>]? autoremove_wake_function+0x40/0x40Aug 19 00:10:01 vm02-test kernel: [282120.240378] [<ffffffffa05713a7>]wait_on_holder+0x47/0x80 [gfs2]Aug 19 00:10:01 vm02-test kernel: [282120.240388] [<ffffffffa05741d8>]gfs2_glock_nq+0x328/0x450 [gfs2]Aug 19 00:10:01 vm02-test kernel: [282120.240399] [<ffffffffa058a8ca>]gfs2_check_blk_type+0x4a/0x150 [gfs2]Aug 19 00:10:01 vm02-test kernel: [282120.240410] [<ffffffffa058a8c1>]? gfs2_check_blk_type+0x41/0x150 [gfs2]Aug 19 00:10:01 vm02-test kernel: [282120.240421] [<ffffffffa058ba0c>]gfs2_evict_inode+0x2cc/0x360 [gfs2]Aug 19 00:10:01 vm02-test kernel: [282120.240432] [<ffffffffa058b842>]? gfs2_evict_inode+0x102/0x360 [gfs2]Aug 19 00:10:01 vm02-test kernel: [282120.240437] [<ffffffff811940c2>]evict+0xb2/0x1b0Aug 19 00:10:01 vm02-test kernel: [282120.240440] [<ffffffff811942c9>]iput+0x109/0x210Aug 19 00:10:01 vm02-test kernel: [282120.240448] [<ffffffffa0572fdc>]delete_work_func+0x5c/0x90 [gfs2]Aug 19 00:10:01 vm02-test kernel: [282120.240453] [<ffffffff8106d5fa>]process_one_work+0x12a/0x420Aug 19 00:10:01 vm02-test kernel: [282120.240462] [<ffffffffa0572f80>]? gfs2_holder_uninit+0x40/0x40 [gfs2]Aug 19 00:10:01 vm02-test kernel: [282120.240465] [<ffffffff8106e19e>]worker_thread+0x12e/0x2f0Aug 19 00:10:01 vm02-test kernel: [282120.240469] [<ffffffff8106e070>]? manage_workers.isra.25+0x200/0x200Aug 19 00:10:01 vm02-test kernel: [282120.240472] [<ffffffff81072e73>]kthread+0x93/0xa0Aug 19 00:10:01 vm02-test kernel: [282120.240477] [<ffffffff816710a4>]kernel_thread_helper+0x4/0x10Aug 19 00:10:01 vm02-test kernel: [282120.240480] [<ffffffff81072de0>]? flush_kthread_worker+0x80/0x80Aug 19 00:10:01 vm02-test kernel: [282120.240484] [<ffffffff816710a0>]? gs_change+0x13/0x13 Aug 19 00:12:01 vm02-test kernel: [282240.240061] INFO: task kworker/1:0:3117 blocked for more than 120 seconds. Aug 19 00:12:01 vm02-test kernel: [282240.240175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Aug 19 00:12:01 vm02-test kernel: [282240.240289] kworker/1:0 Dffff88032fc93900 0 3117 2 0x00000000Aug 19 00:12:01 vm02-test kernel: [282240.240294] ffff8802bb4dfb300000000000000046 ffff88031e4744d0 ffff8802bb4dffd8Aug 19 00:12:01 vm02-test kernel: [282240.240299] ffff8802bb4dffd8ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0Aug 19 00:12:01 vm02-test kernel: [282240.240304] 0000000000000286ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002 Aug 19 00:12:01 vm02-test kernel: [282240.240309] Call Trace:Aug 19 00:12:01 vm02-test kernel: [282240.240326] [<ffffffffa0570290>]? gfs2_glock_demote_wait+0x20/0x20 [gfs2]Aug 19 00:12:01 vm02-test kernel: [282240.240332] [<ffffffff81666c89>]schedule+0x29/0x70Aug 19 00:12:01 vm02-test kernel: [282240.240341] [<ffffffffa057029e>]gfs2_glock_holder_wait+0xe/0x20 [gfs2]Aug 19 00:12:01 vm02-test kernel: [282240.240345] [<ffffffff81665400>]__wait_on_bit+0x60/0x90Aug 19 00:12:01 vm02-test kernel: [282240.240353] [<ffffffffa0570290>]? gfs2_glock_demote_wait+0x20/0x20 [gfs2]Aug 19 00:12:01 vm02-test kernel: [282240.240357] [<ffffffff816654ac>]out_of_line_wait_on_bit+0x7c/0x90Aug 19 00:12:01 vm02-test kernel: [282240.240362] [<ffffffff81073400>]? autoremove_wake_function+0x40/0x40Aug 19 00:12:01 vm02-test kernel: [282240.240371] [<ffffffffa05713a7>]wait_on_holder+0x47/0x80 [gfs2]Aug 19 00:12:01 vm02-test kernel: [282240.240380] [<ffffffffa05741d8>]gfs2_glock_nq+0x328/0x450 [gfs2]Aug 19 00:12:01 vm02-test kernel: [282240.240391] [<ffffffffa058a8ca>]gfs2_check_blk_type+0x4a/0x150 [gfs2]Aug 19 00:12:01 vm02-test kernel: [282240.240402] [<ffffffffa058a8c1>]? gfs2_check_blk_type+0x41/0x150 [gfs2]Aug 19 00:12:01 vm02-test kernel: [282240.240413] [<ffffffffa058ba0c>]gfs2_evict_inode+0x2cc/0x360 [gfs2]Aug 19 00:12:01 vm02-test kernel: [282240.240424] [<ffffffffa058b842>]? gfs2_evict_inode+0x102/0x360 [gfs2]Aug 19 00:12:01 vm02-test kernel: [282240.240429] [<ffffffff811940c2>]evict+0xb2/0x1b0Aug 19 00:12:01 vm02-test kernel: [282240.240432] [<ffffffff811942c9>]iput+0x109/0x210Aug 19 00:12:01 vm02-test kernel: [282240.240440] [<ffffffffa0572fdc>]delete_work_func+0x5c/0x90 [gfs2]Aug 19 00:12:01 vm02-test kernel: [282240.240445] [<ffffffff8106d5fa>]process_one_work+0x12a/0x420Aug 19 00:12:01 vm02-test kernel: [282240.240454] [<ffffffffa0572f80>]? gfs2_holder_uninit+0x40/0x40 [gfs2]Aug 19 00:12:01 vm02-test kernel: [282240.240458] [<ffffffff8106e19e>]worker_thread+0x12e/0x2f0Aug 19 00:12:01 vm02-test kernel: [282240.240462] [<ffffffff8106e070>]? manage_workers.isra.25+0x200/0x200Aug 19 00:12:01 vm02-test kernel: [282240.240465] [<ffffffff81072e73>]kthread+0x93/0xa0Aug 19 00:12:01 vm02-test kernel: [282240.240469] [<ffffffff816710a4>]kernel_thread_helper+0x4/0x10Aug 19 00:12:01 vm02-test kernel: [282240.240473] [<ffffffff81072de0>]? flush_kthread_worker+0x80/0x80Aug 19 00:12:01 vm02-test kernel: [282240.240476] [<ffffffff816710a0>]? gs_change+0x13/0x13 <snip, goes on for a while> Kind regards, BartI usually see this when DLM is blocked. DLM usually blocks on a failedfence action.To clarify; this comes up on one of the three nodes only? On the node with these messages, you shouldn't be able to look at the hung FS on theeffected node.Can you share you versions and cluster.conf please? Also, what is in the logs in the three or four minutes before these messages start? Anything interesting in the log files of the other nodes around the same timeperiod? digimer
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster