Re: gfs2 blocking tasks

Bart Verwilst <lists@xxxxxxxxxxx> · Sun, 19 Aug 2012 23:34:45 +0200

Well nothing strange there either i think, nothing on for example 
vm01-test during that period, vm02-test gives me:

Aug 19 00:09:57 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 1760
Aug 19 00:09:57 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply 40000007 to fd 17
Aug 19 00:09:57 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17
Aug 19 00:09:57 vm02-test corosync[7394]:   [CMAN  ] daemon: client 
command is 800000b7
Aug 19 00:09:57 vm02-test corosync[7394]:   [CMAN  ] daemon: About to 
process command
Aug 19 00:09:57 vm02-test corosync[7394]:   [CMAN  ] memb: command to 
process is 800000b7
Aug 19 00:09:57 vm02-test corosync[7394]:   [CMAN  ] memb: command 
return code is 0
Aug 19 00:09:57 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 0
Aug 19 00:09:57 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply c00000b7 to fd 17
Aug 19 00:09:57 vm02-test corosync[7394]:   [CMAN  ] memb: 
quorum_device_timer_fn
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] daemon: client 
command is 7
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] daemon: About to 
process command
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] memb: command to 
process is 7
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] memb: 
get_all_members: allocated new buffer (retsize=1024)
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] memb: 
get_all_members: retlen = 1760
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] memb: command 
return code is 4
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 1760
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply 40000007 to fd 17
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] daemon: client 
command is 800000b7
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] daemon: About to 
process command
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] memb: command to 
process is 800000b7
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] memb: command 
return code is 0
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 0
Aug 19 00:09:58 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply c00000b7 to fd 17
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] daemon: client 
command is 7
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] daemon: About to 
process command
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] memb: command to 
process is 7
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] memb: 
get_all_members: allocated new buffer (retsize=1024)
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] memb: 
get_all_members: retlen = 1760
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] memb: command 
return code is 4
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 1760
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply 40000007 to fd 17
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] daemon: client 
command is 800000b7
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] daemon: About to 
process command
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] memb: command to 
process is 800000b7
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] memb: command 
return code is 0
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 0
Aug 19 00:09:59 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply c00000b7 to fd 17
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] daemon: client 
command is 7
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] daemon: About to 
process command
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] memb: command to 
process is 7
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] memb: 
get_all_members: allocated new buffer (retsize=1024)
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] memb: 
get_all_members: retlen = 1760
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] memb: command 
return code is 4
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 1760
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply 40000007 to fd 17
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] daemon: client 
command is 800000b7
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] daemon: About to 
process command
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] memb: command to 
process is 800000b7
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] memb: command 
return code is 0
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 0
Aug 19 00:10:00 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply c00000b7 to fd 17
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] daemon: client 
command is 7
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] daemon: About to 
process command
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] memb: command to 
process is 7
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] memb: 
get_all_members: allocated new buffer (retsize=1024)
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] memb: 
get_all_members: retlen = 1760
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] memb: command 
return code is 4
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 1760
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply 40000007 to fd 17
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] daemon: client 
command is 800000b7
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] daemon: About to 
process command
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] memb: command to 
process is 800000b7
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] memb: command 
return code is 0
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 0
Aug 19 00:10:01 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply c00000b7 to fd 17
Aug 19 00:10:01 vm02-test kernel: [282120.240067] INFO: task 
kworker/1:0:3117 blocked for more than 120 seconds.
Aug 19 00:10:01 vm02-test kernel: [282120.240182] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 19 00:10:01 vm02-test kernel: [282120.240296] kworker/1:0     D 
ffff88032fc93900     0  3117      2 0x00000000
Aug 19 00:10:01 vm02-test kernel: [282120.240302]  ffff8802bb4dfb30 
0000000000000046 ffff88031e4744d0 ffff8802bb4dffd8
Aug 19 00:10:01 vm02-test kernel: [282120.240307]  ffff8802bb4dffd8 
ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0
Aug 19 00:10:01 vm02-test kernel: [282120.240311]  0000000000000286 
ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002
Aug 19 00:10:01 vm02-test kernel: [282120.240316] Call Trace:
Aug 19 00:10:01 vm02-test kernel: [282120.240334]  [<ffffffffa0570290>] 
? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240340]  [<ffffffff81666c89>] 
schedule+0x29/0x70
Aug 19 00:10:01 vm02-test kernel: [282120.240349]  [<ffffffffa057029e>] 
gfs2_glock_holder_wait+0xe/0x20 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240352]  [<ffffffff81665400>] 
__wait_on_bit+0x60/0x90
Aug 19 00:10:01 vm02-test kernel: [282120.240361]  [<ffffffffa0570290>] 
? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240364]  [<ffffffff816654ac>] 
out_of_line_wait_on_bit+0x7c/0x90
Aug 19 00:10:01 vm02-test kernel: [282120.240369]  [<ffffffff81073400>] 
? autoremove_wake_function+0x40/0x40
Aug 19 00:10:01 vm02-test kernel: [282120.240378]  [<ffffffffa05713a7>] 
wait_on_holder+0x47/0x80 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240388]  [<ffffffffa05741d8>] 
gfs2_glock_nq+0x328/0x450 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240399]  [<ffffffffa058a8ca>] 
gfs2_check_blk_type+0x4a/0x150 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240410]  [<ffffffffa058a8c1>] 
? gfs2_check_blk_type+0x41/0x150 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240421]  [<ffffffffa058ba0c>] 
gfs2_evict_inode+0x2cc/0x360 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240432]  [<ffffffffa058b842>] 
? gfs2_evict_inode+0x102/0x360 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240437]  [<ffffffff811940c2>] 
evict+0xb2/0x1b0
Aug 19 00:10:01 vm02-test kernel: [282120.240440]  [<ffffffff811942c9>] 
iput+0x109/0x210
Aug 19 00:10:01 vm02-test kernel: [282120.240448]  [<ffffffffa0572fdc>] 
delete_work_func+0x5c/0x90 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240453]  [<ffffffff8106d5fa>] 
process_one_work+0x12a/0x420
Aug 19 00:10:01 vm02-test kernel: [282120.240462]  [<ffffffffa0572f80>] 
? gfs2_holder_uninit+0x40/0x40 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240465]  [<ffffffff8106e19e>] 
worker_thread+0x12e/0x2f0
Aug 19 00:10:01 vm02-test kernel: [282120.240469]  [<ffffffff8106e070>] 
? manage_workers.isra.25+0x200/0x200
Aug 19 00:10:01 vm02-test kernel: [282120.240472]  [<ffffffff81072e73>] 
kthread+0x93/0xa0
Aug 19 00:10:01 vm02-test kernel: [282120.240477]  [<ffffffff816710a4>] 
kernel_thread_helper+0x4/0x10
Aug 19 00:10:01 vm02-test kernel: [282120.240480]  [<ffffffff81072de0>] 
? flush_kthread_worker+0x80/0x80
Aug 19 00:10:01 vm02-test kernel: [282120.240484]  [<ffffffff816710a0>] 
? gs_change+0x13/0x13
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] daemon: client 
command is 7
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] daemon: About to 
process command
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] memb: command to 
process is 7
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] memb: 
get_all_members: allocated new buffer (retsize=1024)
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] memb: 
get_all_members: retlen = 1760
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] memb: command 
return code is 4
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 1760
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply 40000007 to fd 17
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] daemon: client 
command is 800000b7
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] daemon: About to 
process command
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] memb: command to 
process is 800000b7
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] memb: command 
return code is 0
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 0
Aug 19 00:10:02 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply c00000b7 to fd 17
Aug 19 00:10:03 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17
Aug 19 00:10:03 vm02-test corosync[7394]:   [CMAN  ] daemon: client 
command is 7
Aug 19 00:10:03 vm02-test corosync[7394]:   [CMAN  ] daemon: About to 
process command
Aug 19 00:10:03 vm02-test corosync[7394]:   [CMAN  ] memb: command to 
process is 7
Aug 19 00:10:03 vm02-test corosync[7394]:   [CMAN  ] memb: 
get_all_members: allocated new buffer (retsize=1024)
Aug 19 00:10:03 vm02-test corosync[7394]:   [CMAN  ] memb: 
get_all_members: retlen = 1760
Aug 19 00:10:03 vm02-test corosync[7394]:   [CMAN  ] memb: command 
return code is 4
Aug 19 00:10:03 vm02-test corosync[7394]:   [CMAN  ] daemon: Returning 
command data. length = 1760
Aug 19 00:10:03 vm02-test corosync[7394]:   [CMAN  ] daemon: sending 
reply 40000007 to fd 17
Aug 19 00:10:03 vm02-test corosync[7394]:   [CMAN  ] daemon: read 20 
bytes from fd 17

Does this help in any way?

Kind regards,

Bart

Digimer schreef op 19.08.2012 22:24:
The config looks fine. Sorry, I should have specified syslog;
/var/log/messages.

On 08/19/2012 03:05 PM, Bart Verwilst wrote:
<cluster name="kvm" config_version="13">
    <logging debug="on"/>
        <clusternodes>
        <clusternode name="vm01-test" nodeid="1">
        <fence>
            <method name="apc">
                <device name="apc01" port="1" action="off"/>
                <device name="apc02" port="1" action="off"/>
                <device name="apc01" port="1" action="on"/>
                <device name="apc02" port="1" action="on"/>
            </method>
        </fence>
        </clusternode>
        <clusternode name="vm02-test" nodeid="2">
        <fence>
            <method name="apc">
                <device name="apc01" port="8" action="off"/>
                <device name="apc02" port="8" action="off"/>
                <device name="apc01" port="8" action="on"/>
                <device name="apc02" port="8" action="on"/>
            </method>
                </fence>
        </clusternode>
        <clusternode name="vm03-test" nodeid="3">
        <fence>
            <method name="apc">
                <device name="apc01" port="2" action="off"/>
                <device name="apc02" port="2" action="off"/>
                <device name="apc01" port="2" action="on"/>
                <device name="apc02" port="2" action="on"/>
            </method>
                </fence>
        </clusternode>
        </clusternodes>
    <fencedevices>
        <fencedevice agent="fence_apc" ipaddr="apc01" secure="on"
login="device" name="apc01" passwd="xxx"/>
        <fencedevice agent="fence_apc" ipaddr="apc02" secure="on"
login="device" name="apc02" passwd="xxx"/>
    </fencedevices>
    <rm log_level="5">
        <failoverdomains>
            <failoverdomain name="any_node" nofailback="1" 
ordered="0"
restricted="0"/>
        </failoverdomains>
        <vm domain="any_node" max_restarts="2" migrate="live"
name="firewall" path="/etc/libvirt/qemu/" recovery="restart"
restart_expire_time="600"/>
        <vm domain="any_node" max_restarts="2" migrate="live"
name="zabbix" path="/etc/libvirt/qemu/" recovery="restart"
restart_expire_time="600"/>
    </rm>
    <totem rrp_mode="none" secauth="off"/>
    <quorumd device="/dev/mapper/iscsi_cluster_quorum"></quorumd>
</cluster>

dlm_control.log*, fenced.log, corosync.log are either empty or just
contain entries until Aug 15 or ( in case of fenced.log ) "Aug 18
23:41:46 fenced logging mode 3 syslog f 160 p 6 logfile p 7
/var/log/cluster/fenced.log". I was expecting at least some log 
output,
since everything seems to work fine. Maybe enabling more debugging 
might
bring something up ( but debugging is already on @ cluster.conf )

Kind regards,

Bart

Digimer schreef op 19.08.2012 17:45:
On 08/19/2012 05:52 AM, Bart Verwilst wrote:
Hi,

I have a 3-node cluster in testing which seem to work quite well ( 
cman,
rgmanager, gfs2, etc ).
On (only) one of my nodes, yesterday i noticed the message below 
in
dmesg.

I saw this 30 minutes after the facts. I could browse both my gfs2
mounts, there was no fencing or anything on any node.

Any idea what might have caused this, and then go away?

Aug 19 00:10:01 vm02-test kernel: [282120.240067] INFO: task
kworker/1:0:3117 blocked for more than 120 seconds.
Aug 19 00:10:01 vm02-test kernel: [282120.240182] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 19 00:10:01 vm02-test kernel: [282120.240296] kworker/1:0     
D
ffff88032fc93900     0  3117      2 0x00000000
Aug 19 00:10:01 vm02-test kernel: [282120.240302]  
ffff8802bb4dfb30
0000000000000046 ffff88031e4744d0 ffff8802bb4dffd8
Aug 19 00:10:01 vm02-test kernel: [282120.240307]  
ffff8802bb4dffd8
ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0
Aug 19 00:10:01 vm02-test kernel: [282120.240311]  
0000000000000286
ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002
Aug 19 00:10:01 vm02-test kernel: [282120.240316] Call Trace:
Aug 19 00:10:01 vm02-test kernel: [282120.240334]  
[<ffffffffa0570290>]
? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240340]  
[<ffffffff81666c89>]
schedule+0x29/0x70
Aug 19 00:10:01 vm02-test kernel: [282120.240349]  
[<ffffffffa057029e>]
gfs2_glock_holder_wait+0xe/0x20 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240352]  
[<ffffffff81665400>]
__wait_on_bit+0x60/0x90
Aug 19 00:10:01 vm02-test kernel: [282120.240361]  
[<ffffffffa0570290>]
? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240364]  
[<ffffffff816654ac>]
out_of_line_wait_on_bit+0x7c/0x90
Aug 19 00:10:01 vm02-test kernel: [282120.240369]  
[<ffffffff81073400>]
? autoremove_wake_function+0x40/0x40
Aug 19 00:10:01 vm02-test kernel: [282120.240378]  
[<ffffffffa05713a7>]
wait_on_holder+0x47/0x80 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240388]  
[<ffffffffa05741d8>]
gfs2_glock_nq+0x328/0x450 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240399]  
[<ffffffffa058a8ca>]
gfs2_check_blk_type+0x4a/0x150 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240410]  
[<ffffffffa058a8c1>]
? gfs2_check_blk_type+0x41/0x150 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240421]  
[<ffffffffa058ba0c>]
gfs2_evict_inode+0x2cc/0x360 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240432]  
[<ffffffffa058b842>]
? gfs2_evict_inode+0x102/0x360 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240437]  
[<ffffffff811940c2>]
evict+0xb2/0x1b0
Aug 19 00:10:01 vm02-test kernel: [282120.240440]  
[<ffffffff811942c9>]
iput+0x109/0x210
Aug 19 00:10:01 vm02-test kernel: [282120.240448]  
[<ffffffffa0572fdc>]
delete_work_func+0x5c/0x90 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240453]  
[<ffffffff8106d5fa>]
process_one_work+0x12a/0x420
Aug 19 00:10:01 vm02-test kernel: [282120.240462]  
[<ffffffffa0572f80>]
? gfs2_holder_uninit+0x40/0x40 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240465]  
[<ffffffff8106e19e>]
worker_thread+0x12e/0x2f0
Aug 19 00:10:01 vm02-test kernel: [282120.240469]  
[<ffffffff8106e070>]
? manage_workers.isra.25+0x200/0x200
Aug 19 00:10:01 vm02-test kernel: [282120.240472]  
[<ffffffff81072e73>]
kthread+0x93/0xa0
Aug 19 00:10:01 vm02-test kernel: [282120.240477]  
[<ffffffff816710a4>]
kernel_thread_helper+0x4/0x10
Aug 19 00:10:01 vm02-test kernel: [282120.240480]  
[<ffffffff81072de0>]
? flush_kthread_worker+0x80/0x80
Aug 19 00:10:01 vm02-test kernel: [282120.240484]  
[<ffffffff816710a0>]
? gs_change+0x13/0x13
Aug 19 00:12:01 vm02-test kernel: [282240.240061] INFO: task
kworker/1:0:3117 blocked for more than 120 seconds.
Aug 19 00:12:01 vm02-test kernel: [282240.240175] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 19 00:12:01 vm02-test kernel: [282240.240289] kworker/1:0     
D
ffff88032fc93900     0  3117      2 0x00000000
Aug 19 00:12:01 vm02-test kernel: [282240.240294]  
ffff8802bb4dfb30
0000000000000046 ffff88031e4744d0 ffff8802bb4dffd8
Aug 19 00:12:01 vm02-test kernel: [282240.240299]  
ffff8802bb4dffd8
ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0
Aug 19 00:12:01 vm02-test kernel: [282240.240304]  
0000000000000286
ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002
Aug 19 00:12:01 vm02-test kernel: [282240.240309] Call Trace:
Aug 19 00:12:01 vm02-test kernel: [282240.240326]  
[<ffffffffa0570290>]
? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
Aug 19 00:12:01 vm02-test kernel: [282240.240332]  
[<ffffffff81666c89>]
schedule+0x29/0x70
Aug 19 00:12:01 vm02-test kernel: [282240.240341]  
[<ffffffffa057029e>]
gfs2_glock_holder_wait+0xe/0x20 [gfs2]
Aug 19 00:12:01 vm02-test kernel: [282240.240345]  
[<ffffffff81665400>]
__wait_on_bit+0x60/0x90
Aug 19 00:12:01 vm02-test kernel: [282240.240353]  
[<ffffffffa0570290>]
? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
Aug 19 00:12:01 vm02-test kernel: [282240.240357]  
[<ffffffff816654ac>]
out_of_line_wait_on_bit+0x7c/0x90
Aug 19 00:12:01 vm02-test kernel: [282240.240362]  
[<ffffffff81073400>]
? autoremove_wake_function+0x40/0x40
Aug 19 00:12:01 vm02-test kernel: [282240.240371]  
[<ffffffffa05713a7>]
wait_on_holder+0x47/0x80 [gfs2]
Aug 19 00:12:01 vm02-test kernel: [282240.240380]  
[<ffffffffa05741d8>]
gfs2_glock_nq+0x328/0x450 [gfs2]
Aug 19 00:12:01 vm02-test kernel: [282240.240391]  
[<ffffffffa058a8ca>]
gfs2_check_blk_type+0x4a/0x150 [gfs2]
Aug 19 00:12:01 vm02-test kernel: [282240.240402]  
[<ffffffffa058a8c1>]
? gfs2_check_blk_type+0x41/0x150 [gfs2]
Aug 19 00:12:01 vm02-test kernel: [282240.240413]  
[<ffffffffa058ba0c>]
gfs2_evict_inode+0x2cc/0x360 [gfs2]
Aug 19 00:12:01 vm02-test kernel: [282240.240424]  
[<ffffffffa058b842>]
? gfs2_evict_inode+0x102/0x360 [gfs2]
Aug 19 00:12:01 vm02-test kernel: [282240.240429]  
[<ffffffff811940c2>]
evict+0xb2/0x1b0
Aug 19 00:12:01 vm02-test kernel: [282240.240432]  
[<ffffffff811942c9>]
iput+0x109/0x210
Aug 19 00:12:01 vm02-test kernel: [282240.240440]  
[<ffffffffa0572fdc>]
delete_work_func+0x5c/0x90 [gfs2]
Aug 19 00:12:01 vm02-test kernel: [282240.240445]  
[<ffffffff8106d5fa>]
process_one_work+0x12a/0x420
Aug 19 00:12:01 vm02-test kernel: [282240.240454]  
[<ffffffffa0572f80>]
? gfs2_holder_uninit+0x40/0x40 [gfs2]
Aug 19 00:12:01 vm02-test kernel: [282240.240458]  
[<ffffffff8106e19e>]
worker_thread+0x12e/0x2f0
Aug 19 00:12:01 vm02-test kernel: [282240.240462]  
[<ffffffff8106e070>]
? manage_workers.isra.25+0x200/0x200
Aug 19 00:12:01 vm02-test kernel: [282240.240465]  
[<ffffffff81072e73>]
kthread+0x93/0xa0
Aug 19 00:12:01 vm02-test kernel: [282240.240469]  
[<ffffffff816710a4>]
kernel_thread_helper+0x4/0x10
Aug 19 00:12:01 vm02-test kernel: [282240.240473]  
[<ffffffff81072de0>]
? flush_kthread_worker+0x80/0x80
Aug 19 00:12:01 vm02-test kernel: [282240.240476]  
[<ffffffff816710a0>]
? gs_change+0x13/0x13
<snip, goes on for a while>

Kind regards,

Bart

I usually see this when DLM is blocked. DLM usually blocks on a 
failed
fence action.

To clarify; this comes up on one of the three nodes only? On the 
node
with these messages, you shouldn't be able to look at the hung FS 
on the
effected node.

Can you share you versions and cluster.conf please? Also, what is 
in the
logs in the three or four minutes before these messages start? 
Anything
interesting in the log files of the other nodes around the same 
time
period?

digimer

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster