The config looks fine. Sorry, I should have specified syslog; /var/log/messages. On 08/19/2012 03:05 PM, Bart Verwilst wrote: > <cluster name="kvm" config_version="13"> > <logging debug="on"/> > <clusternodes> > <clusternode name="vm01-test" nodeid="1"> > <fence> > <method name="apc"> > <device name="apc01" port="1" action="off"/> > <device name="apc02" port="1" action="off"/> > <device name="apc01" port="1" action="on"/> > <device name="apc02" port="1" action="on"/> > </method> > </fence> > </clusternode> > <clusternode name="vm02-test" nodeid="2"> > <fence> > <method name="apc"> > <device name="apc01" port="8" action="off"/> > <device name="apc02" port="8" action="off"/> > <device name="apc01" port="8" action="on"/> > <device name="apc02" port="8" action="on"/> > </method> > </fence> > </clusternode> > <clusternode name="vm03-test" nodeid="3"> > <fence> > <method name="apc"> > <device name="apc01" port="2" action="off"/> > <device name="apc02" port="2" action="off"/> > <device name="apc01" port="2" action="on"/> > <device name="apc02" port="2" action="on"/> > </method> > </fence> > </clusternode> > </clusternodes> > <fencedevices> > <fencedevice agent="fence_apc" ipaddr="apc01" secure="on" > login="device" name="apc01" passwd="xxx"/> > <fencedevice agent="fence_apc" ipaddr="apc02" secure="on" > login="device" name="apc02" passwd="xxx"/> > </fencedevices> > <rm log_level="5"> > <failoverdomains> > <failoverdomain name="any_node" nofailback="1" ordered="0" > restricted="0"/> > </failoverdomains> > <vm domain="any_node" max_restarts="2" migrate="live" > name="firewall" path="/etc/libvirt/qemu/" recovery="restart" > restart_expire_time="600"/> > <vm domain="any_node" max_restarts="2" migrate="live" > name="zabbix" path="/etc/libvirt/qemu/" recovery="restart" > restart_expire_time="600"/> > </rm> > <totem rrp_mode="none" secauth="off"/> > <quorumd device="/dev/mapper/iscsi_cluster_quorum"></quorumd> > </cluster> > > dlm_control.log*, fenced.log, corosync.log are either empty or just > contain entries until Aug 15 or ( in case of fenced.log ) "Aug 18 > 23:41:46 fenced logging mode 3 syslog f 160 p 6 logfile p 7 > /var/log/cluster/fenced.log". I was expecting at least some log output, > since everything seems to work fine. Maybe enabling more debugging might > bring something up ( but debugging is already on @ cluster.conf ) > > Kind regards, > > Bart > > Digimer schreef op 19.08.2012 17:45: >> On 08/19/2012 05:52 AM, Bart Verwilst wrote: >>> Hi, >>> >>> I have a 3-node cluster in testing which seem to work quite well ( cman, >>> rgmanager, gfs2, etc ). >>> On (only) one of my nodes, yesterday i noticed the message below in >>> dmesg. >>> >>> I saw this 30 minutes after the facts. I could browse both my gfs2 >>> mounts, there was no fencing or anything on any node. >>> >>> Any idea what might have caused this, and then go away? >>> >>> Aug 19 00:10:01 vm02-test kernel: [282120.240067] INFO: task >>> kworker/1:0:3117 blocked for more than 120 seconds. >>> Aug 19 00:10:01 vm02-test kernel: [282120.240182] "echo 0 > >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> Aug 19 00:10:01 vm02-test kernel: [282120.240296] kworker/1:0 D >>> ffff88032fc93900 0 3117 2 0x00000000 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240302] ffff8802bb4dfb30 >>> 0000000000000046 ffff88031e4744d0 ffff8802bb4dffd8 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240307] ffff8802bb4dffd8 >>> ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240311] 0000000000000286 >>> ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240316] Call Trace: >>> Aug 19 00:10:01 vm02-test kernel: [282120.240334] [<ffffffffa0570290>] >>> ? gfs2_glock_demote_wait+0x20/0x20 [gfs2] >>> Aug 19 00:10:01 vm02-test kernel: [282120.240340] [<ffffffff81666c89>] >>> schedule+0x29/0x70 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240349] [<ffffffffa057029e>] >>> gfs2_glock_holder_wait+0xe/0x20 [gfs2] >>> Aug 19 00:10:01 vm02-test kernel: [282120.240352] [<ffffffff81665400>] >>> __wait_on_bit+0x60/0x90 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240361] [<ffffffffa0570290>] >>> ? gfs2_glock_demote_wait+0x20/0x20 [gfs2] >>> Aug 19 00:10:01 vm02-test kernel: [282120.240364] [<ffffffff816654ac>] >>> out_of_line_wait_on_bit+0x7c/0x90 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240369] [<ffffffff81073400>] >>> ? autoremove_wake_function+0x40/0x40 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240378] [<ffffffffa05713a7>] >>> wait_on_holder+0x47/0x80 [gfs2] >>> Aug 19 00:10:01 vm02-test kernel: [282120.240388] [<ffffffffa05741d8>] >>> gfs2_glock_nq+0x328/0x450 [gfs2] >>> Aug 19 00:10:01 vm02-test kernel: [282120.240399] [<ffffffffa058a8ca>] >>> gfs2_check_blk_type+0x4a/0x150 [gfs2] >>> Aug 19 00:10:01 vm02-test kernel: [282120.240410] [<ffffffffa058a8c1>] >>> ? gfs2_check_blk_type+0x41/0x150 [gfs2] >>> Aug 19 00:10:01 vm02-test kernel: [282120.240421] [<ffffffffa058ba0c>] >>> gfs2_evict_inode+0x2cc/0x360 [gfs2] >>> Aug 19 00:10:01 vm02-test kernel: [282120.240432] [<ffffffffa058b842>] >>> ? gfs2_evict_inode+0x102/0x360 [gfs2] >>> Aug 19 00:10:01 vm02-test kernel: [282120.240437] [<ffffffff811940c2>] >>> evict+0xb2/0x1b0 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240440] [<ffffffff811942c9>] >>> iput+0x109/0x210 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240448] [<ffffffffa0572fdc>] >>> delete_work_func+0x5c/0x90 [gfs2] >>> Aug 19 00:10:01 vm02-test kernel: [282120.240453] [<ffffffff8106d5fa>] >>> process_one_work+0x12a/0x420 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240462] [<ffffffffa0572f80>] >>> ? gfs2_holder_uninit+0x40/0x40 [gfs2] >>> Aug 19 00:10:01 vm02-test kernel: [282120.240465] [<ffffffff8106e19e>] >>> worker_thread+0x12e/0x2f0 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240469] [<ffffffff8106e070>] >>> ? manage_workers.isra.25+0x200/0x200 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240472] [<ffffffff81072e73>] >>> kthread+0x93/0xa0 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240477] [<ffffffff816710a4>] >>> kernel_thread_helper+0x4/0x10 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240480] [<ffffffff81072de0>] >>> ? flush_kthread_worker+0x80/0x80 >>> Aug 19 00:10:01 vm02-test kernel: [282120.240484] [<ffffffff816710a0>] >>> ? gs_change+0x13/0x13 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240061] INFO: task >>> kworker/1:0:3117 blocked for more than 120 seconds. >>> Aug 19 00:12:01 vm02-test kernel: [282240.240175] "echo 0 > >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> Aug 19 00:12:01 vm02-test kernel: [282240.240289] kworker/1:0 D >>> ffff88032fc93900 0 3117 2 0x00000000 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240294] ffff8802bb4dfb30 >>> 0000000000000046 ffff88031e4744d0 ffff8802bb4dffd8 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240299] ffff8802bb4dffd8 >>> ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240304] 0000000000000286 >>> ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240309] Call Trace: >>> Aug 19 00:12:01 vm02-test kernel: [282240.240326] [<ffffffffa0570290>] >>> ? gfs2_glock_demote_wait+0x20/0x20 [gfs2] >>> Aug 19 00:12:01 vm02-test kernel: [282240.240332] [<ffffffff81666c89>] >>> schedule+0x29/0x70 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240341] [<ffffffffa057029e>] >>> gfs2_glock_holder_wait+0xe/0x20 [gfs2] >>> Aug 19 00:12:01 vm02-test kernel: [282240.240345] [<ffffffff81665400>] >>> __wait_on_bit+0x60/0x90 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240353] [<ffffffffa0570290>] >>> ? gfs2_glock_demote_wait+0x20/0x20 [gfs2] >>> Aug 19 00:12:01 vm02-test kernel: [282240.240357] [<ffffffff816654ac>] >>> out_of_line_wait_on_bit+0x7c/0x90 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240362] [<ffffffff81073400>] >>> ? autoremove_wake_function+0x40/0x40 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240371] [<ffffffffa05713a7>] >>> wait_on_holder+0x47/0x80 [gfs2] >>> Aug 19 00:12:01 vm02-test kernel: [282240.240380] [<ffffffffa05741d8>] >>> gfs2_glock_nq+0x328/0x450 [gfs2] >>> Aug 19 00:12:01 vm02-test kernel: [282240.240391] [<ffffffffa058a8ca>] >>> gfs2_check_blk_type+0x4a/0x150 [gfs2] >>> Aug 19 00:12:01 vm02-test kernel: [282240.240402] [<ffffffffa058a8c1>] >>> ? gfs2_check_blk_type+0x41/0x150 [gfs2] >>> Aug 19 00:12:01 vm02-test kernel: [282240.240413] [<ffffffffa058ba0c>] >>> gfs2_evict_inode+0x2cc/0x360 [gfs2] >>> Aug 19 00:12:01 vm02-test kernel: [282240.240424] [<ffffffffa058b842>] >>> ? gfs2_evict_inode+0x102/0x360 [gfs2] >>> Aug 19 00:12:01 vm02-test kernel: [282240.240429] [<ffffffff811940c2>] >>> evict+0xb2/0x1b0 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240432] [<ffffffff811942c9>] >>> iput+0x109/0x210 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240440] [<ffffffffa0572fdc>] >>> delete_work_func+0x5c/0x90 [gfs2] >>> Aug 19 00:12:01 vm02-test kernel: [282240.240445] [<ffffffff8106d5fa>] >>> process_one_work+0x12a/0x420 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240454] [<ffffffffa0572f80>] >>> ? gfs2_holder_uninit+0x40/0x40 [gfs2] >>> Aug 19 00:12:01 vm02-test kernel: [282240.240458] [<ffffffff8106e19e>] >>> worker_thread+0x12e/0x2f0 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240462] [<ffffffff8106e070>] >>> ? manage_workers.isra.25+0x200/0x200 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240465] [<ffffffff81072e73>] >>> kthread+0x93/0xa0 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240469] [<ffffffff816710a4>] >>> kernel_thread_helper+0x4/0x10 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240473] [<ffffffff81072de0>] >>> ? flush_kthread_worker+0x80/0x80 >>> Aug 19 00:12:01 vm02-test kernel: [282240.240476] [<ffffffff816710a0>] >>> ? gs_change+0x13/0x13 >>> <snip, goes on for a while> >>> >>> Kind regards, >>> >>> Bart >> >> I usually see this when DLM is blocked. DLM usually blocks on a failed >> fence action. >> >> To clarify; this comes up on one of the three nodes only? On the node >> with these messages, you shouldn't be able to look at the hung FS on the >> effected node. >> >> Can you share you versions and cluster.conf please? Also, what is in the >> logs in the three or four minutes before these messages start? Anything >> interesting in the log files of the other nodes around the same time >> period? >> >> digimer > -- Digimer Papers and Projects: https://alteeve.ca -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster