I realized, quite accidentally, that any downtime on either of the nodes (e.g., a reboot) causes corruption/inconsistencies in the DRBD resources because the DRBD node that was the DRBD primary (i.e., the preferred-primary) will forcefully become primary again when the node returns [thereby discarding modifications made on the older primary].
Therefore, in order to prevent this from happening, it's probably best to REMOVE the final primitive from each group:
> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1
> crm configure location l_iSCSI-san1+DRBD-r1 p_IP-1_253 10240: san2
This will prevent Pacemaker from promoting the younger primary and overwriting the modifications made on the older primary [when the preferred-primary node returns]. The DRBD resources can be moved manually...
> crm resource move p_IP-1_254 san1
> crm resource move p_IP-1_253 san2
...in order to distribute the workload between san1 & san2.
Thoughts? Suggestions?
Eric Pretorious
Truckee, CA
Therefore, in order to prevent this from happening, it's probably best to REMOVE the final primitive from each group:
> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1
> crm configure location l_iSCSI-san1+DRBD-r1 p_IP-1_253 10240: san2
> crm resource move p_IP-1_254 san1
> crm resource move p_IP-1_253 san2
...in order to distribute the workload between san1 & san2.
Thoughts? Suggestions?
Eric Pretorious
Truckee, CA
From: Eric <epretorious@xxxxxxxxx>
To: linux clustering <linux-cluster@xxxxxxxxxx>
Sent: Friday, January 18, 2013 12:40 PM
Subject: Re: [RESOLVED] HA iSCSI with DRBD
After rebooting both nodes, I checked the cluster status again and found this:
Code:
> san1:~ # crm_mon -1
> ============
> Last updated: Fri Jan 18 11:51:28 2013
> Last change: Fri Jan 18 09:00:03 2013 by root via cibadmin on san2
> Stack: openais
> Current DC: san2 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 9 Resources configured.
> ============
>
> Online: [ san1 san2 ]
>
> Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
> Masters: [ san2 ]
> Slaves: [ san1 ]
> Resource Group: g_iSCSI-san1
> p_iSCSI-san1 (ocf::heartbeat:iSCSITarget): Started san2
> p_iSCSI-san1_0 (ocf::heartbeat:iSCSILogicalUnit): Started san2
> p_iSCSI-san1_1 (ocf::heartbeat:iSCSILogicalUnit): Started san2
> p_iSCSI-san1_2 (ocf::heartbeat:iSCSILogicalUnit): Started san2
> p_iSCSI-san1_3 (ocf::heartbeat:iSCSILogicalUnit): Started san2
> p_iSCSI-san1_4 (ocf::heartbeat:iSCSILogicalUnit): Stopped
> p_IP-1_254 (ocf::heartbeat:IPaddr2): Stopped
>
> Failed actions:
> p_iSCSI-san1_4_start_0 (node=san1, call=25, rc=1, status=complete): unknown error
> p_iSCSI-san1_4_start_0 (node=san2, call=30, rc=1, status=complete): unknown error
...and that's when it occured to me: There are only four volumes defined in the DRBD cofiguration (0, 1, 2, & 3) - not five (0, 1, 2, 3, & 4)! i.e., The p_iSCSI-san1_4 primitive was failing (because there is no volume /dev/drbd4) and that, in turn, was holding up theresource group g_iSCSI-san1 and causing all of the other primitives [e.g., p_IP-1_254] to fail too!
So, I deleted p_iSCSI-san1_4 from the CIB and the cluster began working as designed:
> san2:~ # ll /dev/drbd*
> brw-rw---- 1 root disk 147, 0 Jan 18 11:47 /dev/drbd0
> brw-rw---- 1 root disk 147, 1 Jan 18 11:47 /dev/drbd1
> brw-rw---- 1 root disk 147, 2 Jan 18 11:47 /dev/drbd2
> brw-rw---- 1 root disk 147, 3 Jan 18 11:47 /dev/drbd3
>
> ...>> san2:~ # crm_mon -1
> ============
> Last updated: Fri Jan 18 11:53:03 2013
> Last change: Fri Jan 18 11:52:58 2013 by root via cibadmin on san2
> Stack: openais
> Current DC: san2 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 8 Resources configured.
> ============
>
> Online: [ san1 san2 ]
>
> Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
> Masters: [ san2 ]
> Slaves: [ san1 ]
> Resource Group: g_iSCSI-san1
> p_iSCSI-san1 (ocf::heartbeat:iSCSITarget): Started san2
> p_iSCSI-san1_0 (ocf::heartbeat:iSCSILogicalUnit): Started san2
> p_iSCSI-san1_1 (ocf::heartbeat:iSCSILogicalUnit): Started san2
> p_iSCSI-san1_2 (ocf::heartbeat:iSCSILogicalUnit): Started san2
> p_iSCSI-san1_3 (ocf::heartbeat:iSCSILogicalUnit): Started san2
> p_IP-1_254 (ocf::heartbeat:IPaddr2): Started san2
From the iSCSI client (xen2):
> xen2:~ # iscsiadm -m discovery -t st -p 192.168.1.254
> 192.168.1.254:3260,1 iqn.2012-11.com.example.san1:sda
> 192.168.0.2:3260,1 iqn.2012-11.com.example.san1:sda
> 192.168.1.2:3260,1 iqn.2012-11.com.example.san1:sdaProblem fixed!Eric PretoriousTruckee, CA
From: Eric <epretorious@xxxxxxxxx>
To: linux clustering <linux-cluster@xxxxxxxxxx>
Sent: Thursday, January 17, 2013 8:59 PM
Subject: HA iSCSI with DRBD
I've been attempting to follow the recipe laid-out in the Linbit guide "Highly available iSCSI storage with DRBD and Pacemaker" to create a highly-available iSCSI server on the two servers san1 & san2 but can't quite get the details right:> crm configure property stonith-enabled=false
> crm configure property no-quorum-policy=ignore
>
> crm configure primitive p_IP-1_254 ocf:heartbeat:IPaddr2 params ip=192.168.1.254 cidr_netmask=24 op monitor interval=30s
>
> crm configure primitive p_DRBD-r0 ocf:linbit:drbd params drbd_resource=r0 op monitor interval=60s
> crm configure ms ms_DRBD-r0 p_DRBD-r0 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
>
> crm configure primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget params iqn=iqn.2012-11.com.example.san1:sda op monitor interval=10s
> crm configure primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=0 path=/dev/drbd0 op monitor interval=10s
> crm configure primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=1 path=/dev/drbd1 op monitor interval=10s
> crm configure primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=2 path=/dev/drbd2 op monitor interval=10s
> crm configure primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=3 path=/dev/drbd3 op monitor interval=10s
> crm configure primitive p_iSCSI-san1_4 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=4 path=/dev/drbd4 op monitor interval=10s
>
> crm configure group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_iSCSI-san1_4 p_IP-1_254
> crm configure order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start
> crm configure colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master
> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1IET (i.e., iscsitarget) is already running (with the default configuration) and DRBD's already correctly configured to create the resource r0...> resource r0 {
> volume 0 {
> device /dev/drbd0 ;
> disk /dev/sda7 ;
> meta-disk internal ;
> }
> volume 1 {
> device /dev/drbd1 ;
> disk /dev/sda8 ;
> meta-disk internal ;
> }
> volume 2 {
> device /dev/drbd2 ;
> disk /dev/sda9 ;
> meta-disk internal ;
> }
> volume 3 {
> device /dev/drbd3 ;
> disk /dev/sda10 ;
> meta-disk internal ;
> }
> on san1 {
> address 192.168.1.1:7789 ;
> }
> on san2 {
> address 192.168.1.2:7789 ;
> }
> }But the shared IP address won't start nor will the LUN's:> san1:~ # crm_mon -1
> ============
> Last updated: Thu Jan 17 20:55:55 2013
> Last change: Thu Jan 17 20:55:09 2013 by root via cibadmin on san1
> Stack: openais
> Current DC: san1 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 9 Resources configured.
> ============
>
> Online: [ san1 san2 ]
>
> Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
> Masters: [ san1 ]
> Slaves: [ san2 ]
> Resource Group: g_iSCSI-san1
> p_iSCSI-san1 (ocf::heartbeat:iSCSITarget): Started san1
> p_iSCSI-san1_0 (ocf::heartbeat:iSCSILogicalUnit): Stopped
> p_iSCSI-san1_1 (ocf::heartbeat:iSCSILogicalUnit): Stopped
> p_iSCSI-san1_2 (ocf::heartbeat:iSCSILogicalUnit): Stopped
> p_iSCSI-san1_3 (ocf::heartbeat:iSCSILogicalUnit): Stopped
> p_iSCSI-san1_4 (ocf::heartbeat:iSCSILogicalUnit): Stopped
> p_IP-1_254 (ocf::heartbeat:IPaddr2): Stopped
>
> Failed actions:
> p_iSCSI-san1_0_start_0 (node=san1, call=23, rc=1, status=complete): unknown error
> p_iSCSI-san1_1_start_0 (node=san1, call=26, rc=1, status=complete): unknown error
> p_iSCSI-san1_2_start_0 (node=san1, call=29, rc=1, status=complete): unknown error
> p_iSCSI-san1_3_start_0 (node=san1, call=32, rc=1, status=complete): unknown error
> p_iSCSI-san1_4_start_0 (node=san1, call=35, rc=1, status=complete): unknown error
> p_iSCSI-san1_0_start_0 (node=san2, call=11, rc=1, status=complete): unknown error
> p_iSCSI-san1_1_start_0 (node=san2, call=14, rc=1, status=complete): unknown error
> p_iSCSI-san1_2_start_0 (node=san2, call=17, rc=1, status=complete): unknown error
> p_iSCSI-san1_3_start_0 (node=san2, call=20, rc=1, status=complete): unknown error
> p_iSCSI-san1_4_start_0 (node=san2, call=23, rc=1, status=complete): unknown errorWhat am I doing wrong?TIA,Eric PretoriousTruckee, CA
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster