Re: qdiskd master election and loss of quorum

Gianluca Cecchi <gianluca.cecchi@xxxxxxxxx> · Thu, 5 Nov 2009 15:28:07 +0100

On Tue, 03 Nov 2009 08:15:05 -0500 Lon Hohberger  wrote:
> Though it's a bit odd that stopping node 1 causes a loss of quorum on
> node2. :(

I'm experimenting the same behaviour with a cluster composed by two nodes in CentOS 5.4

openais-0.80.6-8.el5_4.1
cman-2.0.115-1.el5_4.3
rgmanager-2.0.52-1.el5.centos.2

Here the lines in cluster.conf and below the simulated scenario
[root@mork ~]# egrep "totem|quorum" /etc/cluster/cluster.conf

    <totem token="162000"/>
    <cman quorum_dev_poll="80000" expected_votes="3" two_node="0"/>
    <quorumd device="/dev/sda" interval="5" label="clummquorum" log_facility="local4" log_level="7" tko="16" votes="1">

    </quorumd>

the white paper referred by Alain, apart from related to multipath as he already wrote, says only that quorum_dev_poll must be lesser than totem token....

and  the quorum_dev_poll should be configured to be greater than the value of multipath failover (but here we don't have multipath...)

- mork is the second node and has no services active and its quorum is not master at this moment:

logs on mork
[root@mork ~]# tail -f /var/log/messages 
Nov  5 12:35:41 mork ricci: startup succeeded
Nov  5 12:35:42 mork clurgmgrd: [2633]: <err>   node2   owns vg_cl1/lv_cl1 unable to stop 
Nov  5 12:35:42 mork clurgmgrd[2633]: <notice> stop on lvm "CL1" returned 1 (generic error) 

Nov  5 12:35:42 mork clurgmgrd: [2633]: <err>   node2   owns vg_cl2/lv_cl2 unable to stop 
Nov  5 12:35:42 mork clurgmgrd[2633]: <notice> stop on lvm "CL2" returned 1 (generic error) 
Nov  5 12:36:02 mork qdiskd[2214]: <info> Node 2 is the master 

Nov  5 12:36:52 mork qdiskd[2214]: <info> Initial score 1/1 
Nov  5 12:36:52 mork qdiskd[2214]: <info> Initialization complete 
Nov  5 12:36:52 mork openais[2185]: [CMAN ] quorum device registered 
Nov  5 12:36:52 mork qdiskd[2214]: <notice> Score sufficient for master operation (1/1; required=1); upgrading 

- shutdown of the other rnode (mindy) that has in charge three services (note that mindy shutdowns cleanly)
logs on mork
Nov  5 12:52:53 mork clurgmgrd[2633]: <notice> Member 2 shutting down 
Nov  5 12:52:57 mork qdiskd[2214]: <info> Node 2 shutdown 

Nov  5 12:52:58 mork clurgmgrd[2633]: <notice> Starting stopped service service:MM1SRV 
Nov  5 12:52:58 mork clurgmgrd[2633]: <notice> Starting stopped service service:MM2SRV 
Nov  5 12:52:58 mork clurgmgrd[2633]: <notice> Starting stopped service service:MM3SRV 

Nov  5 12:52:58 mork clurgmgrd: [2633]: <notice> Activating vg_cl1/lv_cl1 
Nov  5 12:52:58 mork clurgmgrd: [2633]: <notice> Making resilient : lvchange -ay vg_cl1/lv_cl1 
Nov  5 12:52:59 mork clurgmgrd: [2633]: <notice> Activating vg_cl2/lv_cl2 

Nov  5 12:52:59 mork clurgmgrd: [2633]: <notice> Resilient command: lvchange -ay vg_cl1/lv_cl1 --config devices{filter=["a|/dev/hda2|","a|/dev/hdb1|","a|/dev/sdb1|","a|/dev/sdc1|","r|.*|"]} 

Nov  5 12:52:59 mork clurgmgrd: [2633]: <notice> Making resilient : lvchange -ay vg_cl2/lv_cl2 
Nov  5 12:52:59 mork clurgmgrd: [2633]: <notice> Resilient command: lvchange -ay vg_cl2/lv_cl2 --config devices{filter=["a|/dev/hda2|","a|/dev/hdb1|","a|/dev/sdb1|","a|/dev/sdc1|","r|.*|"]} 

Nov  5 12:52:59 mork kernel: kjournald starting.  Commit interval 5 seconds
Nov  5 12:52:59 mork kernel: EXT3 FS on dm-3, internal journal
Nov  5 12:52:59 mork kernel: EXT3-fs: mounted filesystem with ordered data mode.

Nov  5 12:52:59 mork kernel: kjournald starting.  Commit interval 5 seconds
Nov  5 12:52:59 mork kernel: EXT3 FS on dm-4, internal journal
Nov  5 12:52:59 mork kernel: EXT3-fs: mounted filesystem with ordered data mode.

Nov  5 12:53:15 mork clurgmgrd[2633]: <err> #75: Failed changing service status 
Nov  5 12:53:30 mork clurgmgrd[2633]: <err> #75: Failed changing service status 
Nov  5 12:53:30 mork clurgmgrd[2633]: <notice> Stopping service service:MM3SRV 

Nov  5 12:53:32 mork qdiskd[2214]: <info> Assuming master role 
Nov  5 12:53:45 mork clurgmgrd[2633]: <err> #52: Failed changing RG status 
Nov  5 12:53:45 mork clurgmgrd[2633]: <crit> #13: Service service:MM3SRV failed to stop cleanly 

- clustat run several times on mork during this phase (note the timeout messages)
[root@mork ~]# clustat 
Timed out waiting for a response from Resource Group Manager
Cluster Status for clumm @ Thu Nov  5 12:54:08 2009

Member Status: Quorate

 Member Name                                                    ID   Status
 ------ ----                                                    ---- ------
 node1                                                              1 Online, Local

 node2                                                              2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                 0 Online, Quorum Disk

[root@mork ~]# clustat 
Service states unavailable: Temporary failure; try again

Cluster Status for clumm @ Thu Nov  5 12:54:14 2009
Member Status: Quorate

 Member Name                                                    ID   Status
 ------ ----                                                    ---- ------

 node1                                                              1 Online, Local
 node2                                                              2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                 0 Online, Quorum Disk

[root@mork ~]# clustat 
Service states unavailable: Temporary failure; try again
Cluster Status for clumm @ Thu Nov  5 12:54:15 2009
Member Status: Quorate

 Member Name                                                    ID   Status

 ------ ----                                                    ---- ------
 node1                                                              1 Online, Local
 node2                                                              2 Offline

 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                 0 Online, Quorum Disk

[root@mork ~]# clustat 
Timed out waiting for a response from Resource Group Manager
Cluster Status for clumm @ Thu Nov  5 12:54:46 2009

Member Status: Quorate

 Member Name                                                    ID   Status
 ------ ----                                                    ---- ------
 node1                                                              1 Online, Local

 node2                                                              2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                 0 Online, Quorum Disk

- service manager is running
[root@mork ~]# service rgmanager status

clurgmgrd (pid  2632) is running...

- cman_tool command outputs
[root@mork ~]# cman_tool services
type             level name       id       state       
fence            0     default    00010001 none        

[1]
dlm              1     rgmanager  00020001 none        
[1]

[root@mork ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   0   M      0   2009-11-05 12:36:52  /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0

   1   M     52   2009-11-05 12:35:30  node1
   2   X     56                        node2

[root@mork ~]# cman_tool status
Version: 6.2.0
Config Version: 7
Cluster Name: clumm
Cluster Id: 3243
Cluster Member: Yes

Cluster Generation: 56
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Quorum device votes: 1
Total votes: 2
Quorum: 2  
Active subsystems: 9
Flags: Dirty 
Ports Bound: 0 177  
Node name: node1

Node ID: 1
Multicast addresses: 239.192.12.183 
Node addresses: 172.16.0.11 

- now clustat gives output but the services remain in starting and never go to "started"
[root@mork ~]# clustat
Cluster Status for clumm @ Thu Nov  5 12:55:16 2009

Member Status: Quorate

 Member Name                                                    ID   Status
 ------ ----                                                    ---- ------
 node1                                                              1 Online, Local, rgmanager

 node2                                                              2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                 0 Online, Quorum Disk

 Service Name                                          Owner (Last)                                          State         

 ------- ----                                          ----- ------                                          -----         
 service:MM1SRV                                        node1                                                 starting      

 service:MM2SRV                                        node1                                                 starting      
 service:MM3SRV                                        node1                                                 starting      

- latest entries in messages   
[root@mork ~]# tail -f  /var/log/messages 
Nov  5 12:53:45 mork clurgmgrd[2633]: <crit> #13: Service service:MM3SRV failed to stop cleanly 
Nov  5 12:54:00 mork clurgmgrd[2633]: <err> #75: Failed changing service status 

Nov  5 12:54:15 mork clurgmgrd[2633]: <err> #57: Failed changing RG status 
Nov  5 12:54:15 mork clurgmgrd[2633]: <notice> Stopping service service:MM1SRV 
Nov  5 12:54:30 mork clurgmgrd[2633]: <notice> Stopping service service:MM2SRV 

Nov  5 12:54:30 mork clurgmgrd[2633]: <err> #52: Failed changing RG status 
Nov  5 12:54:30 mork clurgmgrd[2633]: <crit> #13: Service service:MM1SRV failed to stop cleanly 
Nov  5 12:54:45 mork clurgmgrd[2633]: <err> #52: Failed changing RG status 

Nov  5 12:54:45 mork clurgmgrd[2633]: <crit> #13: Service service:MM2SRV failed to stop cleanly 
Nov  5 12:55:00 mork clurgmgrd[2633]: <err> #57: Failed changing RG status 

- new entries in messages   

[root@mork ~]# tail -f  /var/log/messages 
Nov  5 12:54:30 mork clurgmgrd[2633]: <err> #52: Failed changing RG status 
Nov  5 12:54:30 mork clurgmgrd[2633]: <crit> #13: Service service:MM1SRV failed to stop cleanly 

Nov  5 12:54:45 mork clurgmgrd[2633]: <err> #52: Failed changing RG status 
Nov  5 12:54:45 mork clurgmgrd[2633]: <crit> #13: Service service:MM2SRV failed to stop cleanly 
Nov  5 12:55:00 mork clurgmgrd[2633]: <err> #57: Failed changing RG status 

Nov  5 12:55:15 mork clurgmgrd[2633]: <err> #57: Failed changing RG status 
Nov  5 12:55:41 mork openais[2185]: [TOTEM] The token was lost in the OPERATIONAL state. 
Nov  5 12:55:41 mork openais[2185]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes). 

Nov  5 12:55:41 mork openais[2185]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). 
Nov  5 12:55:41 mork openais[2185]: [TOTEM] entering GATHER state from 2. 
Nov  5 12:55:46 mork openais[2185]: [TOTEM] entering GATHER state from 0. 

Nov  5 12:55:46 mork openais[2185]: [TOTEM] Creating commit token because I am the rep. 
Nov  5 12:55:46 mork openais[2185]: [TOTEM] Saving state aru 64 high seq received 64 
Nov  5 12:55:46 mork openais[2185]: [TOTEM] Storing new sequence id for ring 3c 

Nov  5 12:55:46 mork openais[2185]: [TOTEM] entering COMMIT state. 
Nov  5 12:55:46 mork openais[2185]: [TOTEM] entering RECOVERY state. 
Nov  5 12:55:46 mork openais[2185]: [TOTEM] position [0] member 172.16.0.11: 

Nov  5 12:55:46 mork openais[2185]: [TOTEM] previous ring seq 56 rep 172.16.0.11 
Nov  5 12:55:46 mork openais[2185]: [TOTEM] aru 64 high delivered 64 received flag 1 
Nov  5 12:55:46 mork openais[2185]: [TOTEM] Did not need to originate any messages in recovery. 

Nov  5 12:55:46 mork openais[2185]: [TOTEM] Sending initial ORF token 
Nov  5 12:55:46 mork openais[2185]: [CLM  ] CLM CONFIGURATION CHANGE 
Nov  5 12:55:46 mork openais[2185]: [CLM  ] New Configuration: 
Nov  5 12:55:46 mork kernel: dlm: closing connection to node 2

Nov  5 12:55:46 mork openais[2185]: [CLM  ]     r(0) ip(172.16.0.11)  
Nov  5 12:55:46 mork openais[2185]: [CLM  ] Members Left: 
Nov  5 12:55:46 mork openais[2185]: [CLM  ]     r(0) ip(172.16.0.12)  
Nov  5 12:55:46 mork openais[2185]: [CLM  ] Members Joined: 

Nov  5 12:55:46 mork openais[2185]: [CLM  ] CLM CONFIGURATION CHANGE 
Nov  5 12:55:46 mork openais[2185]: [CLM  ] New Configuration: 
Nov  5 12:55:46 mork openais[2185]: [CLM  ]     r(0) ip(172.16.0.11)  
Nov  5 12:55:46 mork openais[2185]: [CLM  ] Members Left: 

Nov  5 12:55:46 mork openais[2185]: [CLM  ] Members Joined: 
Nov  5 12:55:46 mork openais[2185]: [SYNC ] This node is within the primary component and will provide service. 
Nov  5 12:55:46 mork openais[2185]: [TOTEM] entering OPERATIONAL state. 

Nov  5 12:55:46 mork openais[2185]: [CLM  ] got nodejoin message 172.16.0.11 
Nov  5 12:55:46 mork openais[2185]: [CPG  ] got joinlist message from node 1 

- services remain in "starting"
[root@mork ~]# clustat

Cluster Status for clumm @ Thu Nov  5 12:58:47 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------

 node1                                                               1 Online, Local, rgmanager
 node2                                                               2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                  0 Online, Quorum Disk

 Service Name                                                Owner (Last)                                                State         
 ------- ----                                                ----- ------                                                -----         

 service:MM1SRV                                              node1                                                       starting      
 service:MM2SRV                                              node1                                                       starting      

 service:MM3SRV                                              node1                                                       starting      

- services MM1SRV and MM2SRV are ip+fs (/cl1 and /cl2 respectively): they are active so it seems all was done good but without passing to started form starting....

Also MM3SRV that is an ip only service has been started

[root@mork ~]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       5808616   4045884   1462908  74% /

/dev/hda1               101086     38786     57081  41% /boot
tmpfs                   447656         0    447656   0% /dev/shm
/dev/mapper/vg_cl1-lv_cl1
                       4124352   1258064   2656780  33% /cl1

/dev/mapper/vg_cl2-lv_cl2
                       4124352   1563032   2351812  40% /cl2

[root@mork ~]# ip addr list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000

    link/ether 54:52:00:6a:cb:ba brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.101/24 brd 192.168.122.255 scope global eth0
    inet 192.168.122.113/24 scope global secondary eth0   <--- MM3SRV ip

    inet 192.168.122.111/24 scope global secondary eth0   <--- MM1SRV ip
    inet 192.168.122.112/24 scope global secondary eth0   <--- MM2SRV ip

    inet6 fe80::5652:ff:fe6a:cbba/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 54:52:00:00:0c:c5 brd ff:ff:ff:ff:ff:ff

    inet 172.16.0.11/12 brd 172.31.255.255 scope global eth1
    inet6 fe80::5652:ff:fe00:cc5/64 scope link 
       valid_lft forever preferred_lft forever
4: sit0: <NOARP> mtu 1480 qdisc noop 

    link/sit 0.0.0.0 brd 0.0.0.0
[root@mork ~]# 

- I wait a couple of hours 
  [root@mork ~]# clustat
Cluster Status for clumm @ Thu Nov  5 15:22:23 2009
Member Status: Quorate

 Member Name                                                     ID   Status

 ------ ----                                                     ---- ------
 node1                                                               1 Online, Local, rgmanager
 node2                                                               2 Offline

 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                  0 Online, Quorum Disk

 Service Name                                                     Owner (Last)                                                     State         

 ------- ----                                                     ----- ------                                                     -----         
 service:MM1SRV                                                   node1                                                            starting      

 service:MM2SRV                                                   node1                                                            starting      
 service:MM3SRV                                                   node1                                                            starting      

- resource groups are unlocked:
[root@mork ~]# clusvcadm -S
Resource groups unlocked

- [root@mork ~]# clusvcadm -e MM3SRV
Local machine trying to enable service:MM3SRV...Service is already running

Note that the other node is still powered off
- So to solve the situation I have to do a disable/enable sequence, having downtime (ip alias removed and file systems unmounted in my case):
[root@mork ~]# clusvcadm -d MM3SRV

Local machine disabling service:MM3SRV...Success

[root@mork ~]# clustat
Cluster Status for clumm @ Thu Nov  5 15:25:49 2009
Member Status: Quorate

 Member Name                                                     ID   Status

 ------ ----                                                     ---- ------
 node1                                                               1 Online, Local, rgmanager
 node2                                                               2 Offline

 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                  0 Online, Quorum Disk

 Service Name                                                     Owner (Last)                                                     State         

 ------- ----                                                     ----- ------                                                     -----         
 service:MM1SRV                                                   node1                                                            starting      

 service:MM2SRV                                                   node1                                                            starting      
 service:MM3SRV                                                   (node1)                                                          disabled   

[root@mork ~]# clusvcadm -e MM3SRV
Local machine trying to enable service:MM3SRV...Success
service:MM3SRV is now running on node1
[root@mork ~]# clusvcadm -d MM1SRV
Local machine disabling service:MM1SRV...Success

[root@mork ~]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       5808616   4047656   1461136  74% /
/dev/hda1               101086     38786     57081  41% /boot

tmpfs                   447656         0    447656   0% /dev/shm
/dev/mapper/vg_cl2-lv_cl2
                       4124352   1563032   2351812  40% /cl2
[root@mork ~]# clusvcadm -e MM1SRV
Local machine trying to enable service:MM1SRV...Success

service:MM1SRV is now running on node1
[root@mork ~]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       5808616   4047664   1461128  74% /

/dev/hda1               101086     38786     57081  41% /boot
tmpfs                   447656         0    447656   0% /dev/shm
/dev/mapper/vg_cl2-lv_cl2
                       4124352   1563032   2351812  40% /cl2

/dev/mapper/vg_cl1-lv_cl1
                       4124352   1258064   2656780  33% /cl1

Gianluca
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster