Re: qdiskd master election and loss of quorum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 03 Nov 2009 08:15:05 -0500 Lon Hohberger  wrote:
> Though it's a bit odd that stopping node 1 causes a loss of quorum on
> node2. :(

I'm experimenting the same behaviour with a cluster composed by two nodes in CentOS 5.4
openais-0.80.6-8.el5_4.1
cman-2.0.115-1.el5_4.3
rgmanager-2.0.52-1.el5.centos.2

Here the lines in cluster.conf and below the simulated scenario
[root@mork ~]# egrep "totem|quorum" /etc/cluster/cluster.conf
    <totem token="162000"/>
    <cman quorum_dev_poll="80000" expected_votes="3" two_node="0"/>
    <quorumd device="/dev/sda" interval="5" label="clummquorum" log_facility="local4" log_level="7" tko="16" votes="1">
    </quorumd>

the white paper referred by Alain, apart from related to multipath as he already wrote, says only that quorum_dev_poll must be lesser than totem token....
and the quorum_dev_poll should be configured to be greater than the value of multipath failover (but here we don't have multipath...)

- mork is the second node and has no services active and its quorum is not master at this moment:
logs on mork
[root@mork ~]# tail -f /var/log/messages
Nov  5 12:35:41 mork ricci: startup succeeded
Nov  5 12:35:42 mork clurgmgrd: [2633]: <err>   node2   owns vg_cl1/lv_cl1 unable to stop
Nov  5 12:35:42 mork clurgmgrd[2633]: <notice> stop on lvm "CL1" returned 1 (generic error)
Nov  5 12:35:42 mork clurgmgrd: [2633]: <err>   node2   owns vg_cl2/lv_cl2 unable to stop
Nov  5 12:35:42 mork clurgmgrd[2633]: <notice> stop on lvm "CL2" returned 1 (generic error)
Nov  5 12:36:02 mork qdiskd[2214]: <info> Node 2 is the master
Nov  5 12:36:52 mork qdiskd[2214]: <info> Initial score 1/1
Nov  5 12:36:52 mork qdiskd[2214]: <info> Initialization complete
Nov  5 12:36:52 mork openais[2185]: [CMAN ] quorum device registered
Nov  5 12:36:52 mork qdiskd[2214]: <notice> Score sufficient for master operation (1/1; required=1); upgrading


- shutdown of the other rnode (mindy) that has in charge three services (note that mindy shutdowns cleanly)
logs on mork
Nov  5 12:52:53 mork clurgmgrd[2633]: <notice> Member 2 shutting down
Nov  5 12:52:57 mork qdiskd[2214]: <info> Node 2 shutdown
Nov  5 12:52:58 mork clurgmgrd[2633]: <notice> Starting stopped service service:MM1SRV
Nov  5 12:52:58 mork clurgmgrd[2633]: <notice> Starting stopped service service:MM2SRV
Nov  5 12:52:58 mork clurgmgrd[2633]: <notice> Starting stopped service service:MM3SRV
Nov  5 12:52:58 mork clurgmgrd: [2633]: <notice> Activating vg_cl1/lv_cl1
Nov  5 12:52:58 mork clurgmgrd: [2633]: <notice> Making resilient : lvchange -ay vg_cl1/lv_cl1
Nov  5 12:52:59 mork clurgmgrd: [2633]: <notice> Activating vg_cl2/lv_cl2
Nov  5 12:52:59 mork clurgmgrd: [2633]: <notice> Resilient command: lvchange -ay vg_cl1/lv_cl1 --config devices{filter=["a|/dev/hda2|","a|/dev/hdb1|","a|/dev/sdb1|","a|/dev/sdc1|","r|.*|"]}
Nov  5 12:52:59 mork clurgmgrd: [2633]: <notice> Making resilient : lvchange -ay vg_cl2/lv_cl2
Nov  5 12:52:59 mork clurgmgrd: [2633]: <notice> Resilient command: lvchange -ay vg_cl2/lv_cl2 --config devices{filter=["a|/dev/hda2|","a|/dev/hdb1|","a|/dev/sdb1|","a|/dev/sdc1|","r|.*|"]}
Nov  5 12:52:59 mork kernel: kjournald starting.  Commit interval 5 seconds
Nov  5 12:52:59 mork kernel: EXT3 FS on dm-3, internal journal
Nov  5 12:52:59 mork kernel: EXT3-fs: mounted filesystem with ordered data mode.
Nov  5 12:52:59 mork kernel: kjournald starting.  Commit interval 5 seconds
Nov  5 12:52:59 mork kernel: EXT3 FS on dm-4, internal journal
Nov  5 12:52:59 mork kernel: EXT3-fs: mounted filesystem with ordered data mode.
Nov  5 12:53:15 mork clurgmgrd[2633]: <err> #75: Failed changing service status
Nov  5 12:53:30 mork clurgmgrd[2633]: <err> #75: Failed changing service status
Nov  5 12:53:30 mork clurgmgrd[2633]: <notice> Stopping service service:MM3SRV
Nov  5 12:53:32 mork qdiskd[2214]: <info> Assuming master role
Nov  5 12:53:45 mork clurgmgrd[2633]: <err> #52: Failed changing RG status
Nov  5 12:53:45 mork clurgmgrd[2633]: <crit> #13: Service service:MM3SRV failed to stop cleanly

- clustat run several times on mork during this phase (note the timeout messages)
[root@mork ~]# clustat
Timed out waiting for a response from Resource Group Manager
Cluster Status for clumm @ Thu Nov  5 12:54:08 2009
Member Status: Quorate

 Member Name                                                    ID   Status
 ------ ----                                                    ---- ------
 node1                                                              1 Online, Local
 node2                                                              2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                 0 Online, Quorum Disk

[root@mork ~]# clustat
Service states unavailable: Temporary failure; try again
Cluster Status for clumm @ Thu Nov  5 12:54:14 2009
Member Status: Quorate

 Member Name                                                    ID   Status
 ------ ----                                                    ---- ------
 node1                                                              1 Online, Local
 node2                                                              2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                 0 Online, Quorum Disk

[root@mork ~]# clustat
Service states unavailable: Temporary failure; try again
Cluster Status for clumm @ Thu Nov  5 12:54:15 2009
Member Status: Quorate

 Member Name                                                    ID   Status
 ------ ----                                                    ---- ------
 node1                                                              1 Online, Local
 node2                                                              2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                 0 Online, Quorum Disk


[root@mork ~]# clustat
Timed out waiting for a response from Resource Group Manager
Cluster Status for clumm @ Thu Nov  5 12:54:46 2009
Member Status: Quorate

 Member Name                                                    ID   Status
 ------ ----                                                    ---- ------
 node1                                                              1 Online, Local
 node2                                                              2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                 0 Online, Quorum Disk

- service manager is running
[root@mork ~]# service rgmanager status
clurgmgrd (pid  2632) is running...

- cman_tool command outputs
[root@mork ~]# cman_tool services
type             level name       id       state      
fence            0     default    00010001 none       
[1]
dlm              1     rgmanager  00020001 none       
[1]

[root@mork ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   0   M      0   2009-11-05 12:36:52  /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0
   1   M     52   2009-11-05 12:35:30  node1
   2   X     56                        node2

[root@mork ~]# cman_tool status
Version: 6.2.0
Config Version: 7
Cluster Name: clumm
Cluster Id: 3243
Cluster Member: Yes
Cluster Generation: 56
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Quorum device votes: 1
Total votes: 2
Quorum: 2 
Active subsystems: 9
Flags: Dirty
Ports Bound: 0 177 
Node name: node1
Node ID: 1
Multicast addresses: 239.192.12.183
Node addresses: 172.16.0.11

- now clustat gives output but the services remain in starting and never go to "started"
[root@mork ~]# clustat
Cluster Status for clumm @ Thu Nov  5 12:55:16 2009
Member Status: Quorate

 Member Name                                                    ID   Status
 ------ ----                                                    ---- ------
 node1                                                              1 Online, Local, rgmanager
 node2                                                              2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                 0 Online, Quorum Disk

 Service Name                                          Owner (Last)                                          State        
 ------- ----                                          ----- ------                                          -----        
 service:MM1SRV                                        node1                                                 starting     
 service:MM2SRV                                        node1                                                 starting     
 service:MM3SRV                                        node1                                                 starting     

- latest entries in messages  
[root@mork ~]# tail -f  /var/log/messages
Nov  5 12:53:45 mork clurgmgrd[2633]: <crit> #13: Service service:MM3SRV failed to stop cleanly
Nov  5 12:54:00 mork clurgmgrd[2633]: <err> #75: Failed changing service status
Nov  5 12:54:15 mork clurgmgrd[2633]: <err> #57: Failed changing RG status
Nov  5 12:54:15 mork clurgmgrd[2633]: <notice> Stopping service service:MM1SRV
Nov  5 12:54:30 mork clurgmgrd[2633]: <notice> Stopping service service:MM2SRV
Nov  5 12:54:30 mork clurgmgrd[2633]: <err> #52: Failed changing RG status
Nov  5 12:54:30 mork clurgmgrd[2633]: <crit> #13: Service service:MM1SRV failed to stop cleanly
Nov  5 12:54:45 mork clurgmgrd[2633]: <err> #52: Failed changing RG status
Nov  5 12:54:45 mork clurgmgrd[2633]: <crit> #13: Service service:MM2SRV failed to stop cleanly
Nov  5 12:55:00 mork clurgmgrd[2633]: <err> #57: Failed changing RG status

- new entries in messages  
[root@mork ~]# tail -f  /var/log/messages
Nov  5 12:54:30 mork clurgmgrd[2633]: <err> #52: Failed changing RG status
Nov  5 12:54:30 mork clurgmgrd[2633]: <crit> #13: Service service:MM1SRV failed to stop cleanly
Nov  5 12:54:45 mork clurgmgrd[2633]: <err> #52: Failed changing RG status
Nov  5 12:54:45 mork clurgmgrd[2633]: <crit> #13: Service service:MM2SRV failed to stop cleanly
Nov  5 12:55:00 mork clurgmgrd[2633]: <err> #57: Failed changing RG status
Nov  5 12:55:15 mork clurgmgrd[2633]: <err> #57: Failed changing RG status
Nov  5 12:55:41 mork openais[2185]: [TOTEM] The token was lost in the OPERATIONAL state.
Nov  5 12:55:41 mork openais[2185]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
Nov  5 12:55:41 mork openais[2185]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Nov  5 12:55:41 mork openais[2185]: [TOTEM] entering GATHER state from 2.
Nov  5 12:55:46 mork openais[2185]: [TOTEM] entering GATHER state from 0.
Nov  5 12:55:46 mork openais[2185]: [TOTEM] Creating commit token because I am the rep.
Nov  5 12:55:46 mork openais[2185]: [TOTEM] Saving state aru 64 high seq received 64
Nov  5 12:55:46 mork openais[2185]: [TOTEM] Storing new sequence id for ring 3c
Nov  5 12:55:46 mork openais[2185]: [TOTEM] entering COMMIT state.
Nov  5 12:55:46 mork openais[2185]: [TOTEM] entering RECOVERY state.
Nov  5 12:55:46 mork openais[2185]: [TOTEM] position [0] member 172.16.0.11:
Nov  5 12:55:46 mork openais[2185]: [TOTEM] previous ring seq 56 rep 172.16.0.11
Nov  5 12:55:46 mork openais[2185]: [TOTEM] aru 64 high delivered 64 received flag 1
Nov  5 12:55:46 mork openais[2185]: [TOTEM] Did not need to originate any messages in recovery.
Nov  5 12:55:46 mork openais[2185]: [TOTEM] Sending initial ORF token
Nov  5 12:55:46 mork openais[2185]: [CLM  ] CLM CONFIGURATION CHANGE
Nov  5 12:55:46 mork openais[2185]: [CLM  ] New Configuration:
Nov  5 12:55:46 mork kernel: dlm: closing connection to node 2
Nov  5 12:55:46 mork openais[2185]: [CLM  ]     r(0) ip(172.16.0.11) 
Nov  5 12:55:46 mork openais[2185]: [CLM  ] Members Left:
Nov  5 12:55:46 mork openais[2185]: [CLM  ]     r(0) ip(172.16.0.12) 
Nov  5 12:55:46 mork openais[2185]: [CLM  ] Members Joined:
Nov  5 12:55:46 mork openais[2185]: [CLM  ] CLM CONFIGURATION CHANGE
Nov  5 12:55:46 mork openais[2185]: [CLM  ] New Configuration:
Nov  5 12:55:46 mork openais[2185]: [CLM  ]     r(0) ip(172.16.0.11) 
Nov  5 12:55:46 mork openais[2185]: [CLM  ] Members Left:
Nov  5 12:55:46 mork openais[2185]: [CLM  ] Members Joined:
Nov  5 12:55:46 mork openais[2185]: [SYNC ] This node is within the primary component and will provide service.
Nov  5 12:55:46 mork openais[2185]: [TOTEM] entering OPERATIONAL state.
Nov  5 12:55:46 mork openais[2185]: [CLM  ] got nodejoin message 172.16.0.11
Nov  5 12:55:46 mork openais[2185]: [CPG  ] got joinlist message from node 1

- services remain in "starting"
[root@mork ~]# clustat
Cluster Status for clumm @ Thu Nov  5 12:58:47 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node1                                                               1 Online, Local, rgmanager
 node2                                                               2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                  0 Online, Quorum Disk

 Service Name                                                Owner (Last)                                                State        
 ------- ----                                                ----- ------                                                -----        
 service:MM1SRV                                              node1                                                       starting     
 service:MM2SRV                                              node1                                                       starting     
 service:MM3SRV                                              node1                                                       starting     

- services MM1SRV and MM2SRV are ip+fs (/cl1 and /cl2 respectively): they are active so it seems all was done good but without passing to started form starting....
Also MM3SRV that is an ip only service has been started

[root@mork ~]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       5808616   4045884   1462908  74% /
/dev/hda1               101086     38786     57081  41% /boot
tmpfs                   447656         0    447656   0% /dev/shm
/dev/mapper/vg_cl1-lv_cl1
                       4124352   1258064   2656780  33% /cl1
/dev/mapper/vg_cl2-lv_cl2
                       4124352   1563032   2351812  40% /cl2

[root@mork ~]# ip addr list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 54:52:00:6a:cb:ba brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.101/24 brd 192.168.122.255 scope global eth0
    inet 192.168.122.113/24 scope global secondary eth0   <--- MM3SRV ip
    inet 192.168.122.111/24 scope global secondary eth0   <--- MM1SRV ip
    inet 192.168.122.112/24 scope global secondary eth0   <--- MM2SRV ip
    inet6 fe80::5652:ff:fe6a:cbba/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 54:52:00:00:0c:c5 brd ff:ff:ff:ff:ff:ff
    inet 172.16.0.11/12 brd 172.31.255.255 scope global eth1
    inet6 fe80::5652:ff:fe00:cc5/64 scope link
       valid_lft forever preferred_lft forever
4: sit0: <NOARP> mtu 1480 qdisc noop
    link/sit 0.0.0.0 brd 0.0.0.0
[root@mork ~]#

- I wait a couple of hours
[root@mork ~]# clustat
Cluster Status for clumm @ Thu Nov  5 15:22:23 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node1                                                               1 Online, Local, rgmanager
 node2                                                               2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                  0 Online, Quorum Disk

 Service Name                                                     Owner (Last)                                                     State        
 ------- ----                                                     ----- ------                                                     -----        
 service:MM1SRV                                                   node1                                                            starting     
 service:MM2SRV                                                   node1                                                            starting     
 service:MM3SRV                                                   node1                                                            starting     

- resource groups are unlocked:
[root@mork ~]# clusvcadm -S
Resource groups unlocked

- [root@mork ~]# clusvcadm -e MM3SRV
Local machine trying to enable service:MM3SRV...Service is already running

Note that the other node is still powered off
- So to solve the situation I have to do a disable/enable sequence, having downtime (ip alias removed and file systems unmounted in my case):
[root@mork ~]# clusvcadm -d MM3SRV
Local machine disabling service:MM3SRV...Success

[root@mork ~]# clustat
Cluster Status for clumm @ Thu Nov  5 15:25:49 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node1                                                               1 Online, Local, rgmanager
 node2                                                               2 Offline
 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_scsi0-hd0                  0 Online, Quorum Disk

 Service Name                                                     Owner (Last)                                                     State        
 ------- ----                                                     ----- ------                                                     -----        
 service:MM1SRV                                                   node1                                                            starting     
 service:MM2SRV                                                   node1                                                            starting     
 service:MM3SRV                                                   (node1)                                                          disabled  

[root@mork ~]# clusvcadm -e MM3SRV
Local machine trying to enable service:MM3SRV...Success
service:MM3SRV is now running on node1
[root@mork ~]# clusvcadm -d MM1SRV
Local machine disabling service:MM1SRV...Success
[root@mork ~]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       5808616   4047656   1461136  74% /
/dev/hda1               101086     38786     57081  41% /boot
tmpfs                   447656         0    447656   0% /dev/shm
/dev/mapper/vg_cl2-lv_cl2
                       4124352   1563032   2351812  40% /cl2
[root@mork ~]# clusvcadm -e MM1SRV
Local machine trying to enable service:MM1SRV...Success
service:MM1SRV is now running on node1
[root@mork ~]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       5808616   4047664   1461128  74% /
/dev/hda1               101086     38786     57081  41% /boot
tmpfs                   447656         0    447656   0% /dev/shm
/dev/mapper/vg_cl2-lv_cl2
                       4124352   1563032   2351812  40% /cl2
/dev/mapper/vg_cl1-lv_cl1
                       4124352   1258064   2656780  33% /cl1


Gianluca
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux