Re: cLVM: LVM commands take severl minutes to complete

Vladislav Bogdanov <bubble@xxxxxxxxxxxxx> · Fri, 11 Sep 2015 18:28:22 +0300

11.09.2015 17:02, Daniel Dehennin wrote:
Hello,

On a two node cluster Ubuntu Trusty:

- Linux nebula3 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14 21:42:59
   UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

- corosync 2.3.3-1ubuntu1

- pacemaker 1.1.10+git20130802-1ubuntu2.3

- dlm 4.0.1-0ubuntu1

- clvm 2.02.98-6ubuntu2

You need newer version of this^

2.02.102 is known to include commit 431eda6 without which cluster is 
unusable in degraded state (and even if one node is put to standby state).

You see timeouts with two nodes online, so that is the different issue, 
but that above will not hurt.

- gfs2-utils 3.1.6-0ubuntu1

The LVM commands take minutes to complete:

     root@nebula3:~# time vgs
       Error locking on node 40a8e784: Command timed out
       Error locking on node 40a8e784: Command timed out
       Error locking on node 40a8e784: Command timed out
       VG             #PV #LV #SN Attr   VSize    VFree
       nebula3-vg       1   4   0 wz--n-  133,52g       0
       one-fs           1   1   0 wz--nc    2,00t       0
       one-production   1   0   0 wz--nc 1023,50g 1023,50g

     real    5m40.233s
     user    0m0.005s
     sys     0m0.018s

Do you know where I can look to find what's going on?

Here are some informations:

     root@nebula3:~# corosync-quorumtool
     Quorum information
     ------------------
     Date:             Fri Sep 11 15:57:17 2015
     Quorum provider:  corosync_votequorum
     Nodes:            2
     Node ID:          1084811139
     Ring ID:          1460
     Quorate:          Yes

     Votequorum information
     ----------------------
     Expected votes:   2
     Highest expected: 2
     Total votes:      2
     Quorum:           1
     Flags:            2Node Quorate WaitForAll LastManStanding

Better use two_node: 1 in votequorum section.
That implies wait_for_all and supersedes last_man_standing for two-node 
clusters.

I'd also recommend to set clear_node_high_bit in totem section, do you 
use it?

But even better is to add nodelist section to corosync.conf with 
manually specified nodeid's.

Everything else looks fine...

     Membership information
     ----------------------
         Nodeid      Votes Name
     1084811139          1 192.168.231.131 (local)
     1084811140          1 192.168.231.132

     root@nebula3:~# dlm_tool ls
     dlm lockspaces
     name          datastores
     id            0x1b61ba6a
     flags         0x00000000
     change        member 2 joined 1 remove 0 failed 0 seq 1,1
     members       1084811139 1084811140

     name          clvmd
     id            0x4104eefa
     flags         0x00000000
     change        member 2 joined 1 remove 0 failed 0 seq 1,1
     members       1084811139 1084811140

     root@nebula3:~# dlm_tool status
     cluster nodeid 1084811139 quorate 1 ring seq 1460 1460
     daemon now 11026 fence_pid 0
     node 1084811139 M add 455 rem 0 fail 0 fence 0 at 0 0
     node 1084811140 M add 455 rem 0 fail 0 fence 0 at 0 0

Regards.

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster