umount stuck on NFS gateways switch over by using Pacemaker

<WD_Hwang@xxxxxxxxxxx> · Thu, 28 May 2015 07:23:10 +0000

Hello, 
  I am testing NFS over RBD recently. I am trying to build the NFS HA environment under Ubuntu 14.04 for testing, and the packages version information as follows:
- Ubuntu 14.04 : 3.13.0-32-generic(Ubuntu 14.04.2 LTS)
- ceph : 0.80.9-0ubuntu0.14.04.2
- ceph-common : 0.80.9-0ubuntu0.14.04.2
- pacemaker (git20130802-1ubuntu2.3)
- corosync (2.3.3-1ubuntu1)
PS: I also tried ceph/ceph-common(0.87.1-1trusty and 0.87.2-1trusty) on 3.13.0-48-generic(Ubuntu 14.04.2) server and I got same situations.

  The environment has 5 nodes int the Ceph cluster (3 MONs and 5 OSDs) and two NFS gateway (nfs1 and nfs2) for high availability. I issued the command, 'sudo service pacemaker stop', on 'nfs1' to force these resources
 stopped and transferred to 'nfs2', and vice versa.

When the two nodes are up, I issue 'sudo service pacemaker stop' on one node, the other node will take over all resources. Everything looks fine. Then I wait about 30 minutes and do nothing
 to the NFS gateways. I repeated the previous steps to test fail over procedure. I found the process code of 'umount' is 'D' (uninterruptible sleep), the 'ps' showed the following result

root 21047 0.0 0.0 17412 952 ? D 16:39 0:00 umount /mnt/block1

Have any idea to solve or work around? Because of 'umount' stuck, both 'reboot' and 'shutdown' command can't work well. So if I don't wait 20 minutes for 'umount' time out, the only way I can
 do is powering off the server directly.
Any help would be much appreciated.

I attached my configurations and loggings as follows.

================================================================
Pacemaker configurations:

crm configure primitive p_rbd_map_1 ocf:ceph:rbd.in \
params user="admin" pool="block_data" name="data01" cephconf="/etc/ceph/ceph.conf" \
op monitor interval="10s" timeout="20s"

crm configure primitive p_fs_rbd_1 ocf:heartbeat:Filesystem \
params directory="/mnt/block1" fstype="xfs" device="/dev/rbd1" \
fast_stop="no" options="noatime,nodiratime,nobarrier,inode64" \
op monitor interval="20s" timeout="40s" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="60s"

crm configure primitive p_export_rbd_1 ocf:heartbeat:exportfs \
params directory="/mnt/block1" clientspec="10.35.64.0/24" options="rw,async,no_subtree_check,no_root_squash" fsid="1" \
op monitor interval="10s" timeout="20s" \
op start interval="0" timeout="40s"

crm configure primitive p_vip_1 ocf:heartbeat:IPaddr2 \
params ip="10.35.64.90" cidr_netmask="24" \
op monitor interval="5"

crm configure primitive p_nfs_server lsb:nfs-kernel-server \
op monitor interval="10s" timeout="30s"

crm configure primitive p_rpcbind upstart:rpcbind \
op monitor interval="10s" timeout="30s"

crm configure group g_rbd_share_1 p_rbd_map_1 p_fs_rbd_1 p_export_rbd_1 p_vip_1 \
meta target-role="Started"

crm configure group g_nfs p_rpcbind p_nfs_server \
meta target-role="Started"

crm configure clone clo_nfs g_nfs \
meta globally-unique="false" target-role="Started"

================================================================
'crm_mon' status results for normal condition:
Online: [ nfs1 nfs2 ]

Resource Group: g_rbd_share_1
p_rbd_map_1 (ocf::ceph:rbd.in): Started nfs1
p_fs_rbd_1 (ocf::heartbeat:Filesystem): Started nfs1
p_export_rbd_1 (ocf::heartbeat:exportfs): Started nfs1
p_vip_1 (ocf::heartbeat:IPaddr2): Started nfs1
Clone Set: clo_nfs [g_nfs]
Started: [ nfs1 nfs2 ]

'crm_mon' status results for fail over condition:
Online: [ nfs1 nfs2 ]

Resource Group: g_rbd_share_1
p_rbd_map_1 (ocf::ceph:rbd.in): Started nfs1
p_fs_rbd_1 (ocf::heartbeat:Filesystem): Started nfs1 (unmanaged) FAILED
p_export_rbd_1 (ocf::heartbeat:exportfs): Stopped
p_vip_1 (ocf::heartbeat:IPaddr2): Stopped
Clone Set: clo_nfs [g_nfs]
Started: [ nfs2 ]
Stopped: [ nfs1 ]

Failed actions:
p_fs_rbd_1_stop_0 (node=nfs1, call=114, rc=1, status=Timed Out, last-rc-change=Wed May 13 16:39:10 2015, queued=60002ms, exec=1ms
): unknown error

================================================================
'demsg' messages:

[ 9470.284509] nfsd: last server has exited, flushing export cache
[ 9470.322893] init: rpcbind main process (4267) terminated with status 2
[ 9600.520281] INFO: task umount:2675 blocked for more than 120 seconds.
[ 9600.520445] Not tainted 3.13.0-32-generic #57-Ubuntu
[ 9600.520570] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9600.520792] umount D ffff88003fc13480 0 2675 1 0x00000000
[ 9600.520800] ffff88003a4f9dc0 0000000000000082 ffff880039ece000 ffff88003a4f9fd8
[ 9600.520805] 0000000000013480 0000000000013480 ffff880039ece000 ffff880039ece000
[ 9600.520809] ffff88003fc141a0 0000000000000001 0000000000000000 ffff88003a377928
[ 9600.520814] Call Trace:
[ 9600.520830] [<ffffffff817251a9>] schedule+0x29/0x70
[ 9600.520882] [<ffffffffa043b300>] _xfs_log_force+0x220/0x280 [xfs]
[ 9600.520891] [<ffffffff8109a9b0>] ? wake_up_state+0x20/0x20
[ 9600.520922] [<ffffffffa043b386>] xfs_log_force+0x26/0x80 [xfs]
[ 9600.520947] [<ffffffffa03f3b6d>] xfs_fs_sync_fs+0x2d/0x50 [xfs]
[ 9600.520954] [<ffffffff811edc22>] sync_filesystem+0x72/0xa0
[ 9600.520960] [<ffffffff811bfe30>] generic_shutdown_super+0x30/0xf0
[ 9600.520966] [<ffffffff811c0127>] kill_block_super+0x27/0x70
[ 9600.520971] [<ffffffff811c040d>] deactivate_locked_super+0x3d/0x60
[ 9600.520976] [<ffffffff811c09c6>] deactivate_super+0x46/0x60
[ 9600.520981] [<ffffffff811dd856>] mntput_no_expire+0xd6/0x170
[ 9600.520986] [<ffffffff811dedfe>] SyS_umount+0x8e/0x100
[ 9600.520991] [<ffffffff8173186d>] system_call_fastpath+0x1a/0x1f
[ 9720.520295] INFO: task xfsaild/rbd1:5577 blocked for more than 120 seconds.
[ 9720.520449] Not tainted 3.13.0-32-generic #57-Ubuntu
[ 9720.520574] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9720.520797] xfsaild/rbd1 D ffff88003fc13480 0 5577 2 0x00000000
[ 9720.520805] ffff88003b571d58 0000000000000046 ffff88003c404800 ffff88003b571fd8
[ 9720.520811] 0000000000013480 0000000000013480 ffff88003c404800 ffff88003c404800
[ 9720.520815] ffff88003fc141a0 0000000000000001 0000000000000000 ffff88003a377928
[ 9720.520819] Call Trace:
[ 9720.520835] [<ffffffff817251a9>] schedule+0x29/0x70
[ 9720.520887] [<ffffffffa043b300>] _xfs_log_force+0x220/0x280 [xfs]
[ 9720.520896] [<ffffffff8109a9b0>] ? wake_up_state+0x20/0x20
[ 9720.520927] [<ffffffffa043b386>] xfs_log_force+0x26/0x80 [xfs]
[ 9720.520958] [<ffffffffa043f920>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[ 9720.520986] [<ffffffffa043fa61>] xfsaild+0x141/0x5c0 [xfs]
[ 9720.521013] [<ffffffffa043f920>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[ 9720.521019] [<ffffffff8108b572>] kthread+0xd2/0xf0
[ 9720.521024] [<ffffffff8108b4a0>] ? kthread_create_on_node+0x1c0/0x1c0
[ 9720.521029] [<ffffffff817317bc>] ret_from_fork+0x7c/0xb0
[ 9720.521033] [<ffffffff8108b4a0>] ? kthread_create_on_node+0x1c0/0x1c0

Sincerely yours.
WD Hwang

---------------------------------------------------------------------------------------------------------------------------------------------------------------
This email contains confidential or legally privileged information and is for the sole use of its intended recipient. 
Any unauthorized review, use, copying or distribution of this email or the content of this email is strictly prohibited.
If you are not the intended recipient, you may reply to the sender and should delete this e-mail immediately.
---------------------------------------------------------------------------------------------------------------------------------------------------------------

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com