Re: [RESOLVED] HA iSCSI with DRBD

Eric <epretorious@xxxxxxxxx> · Tue, 22 Jan 2013 17:05:57 -0800 (PST)

I realized, quite accidentally, that any downtime on either of the nodes (e.g., a reboot) causes corruption/inconsistencies in the DRBD resources because the DRBD node that was the DRBD primary (i.e., the preferred-primary) will forcefully become primary again when the node returns [thereby discarding modifications made on the older primary].

Therefore,  in order to prevent this from happening, it's probably best to REMOVE the final primitive from each group:

> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1
> crm configure location l_iSCSI-san1+DRBD-r1 p_IP-1_253 10240: san2
This will prevent Pacemaker from promoting the younger primary and overwriting the modifications made on the older primary [when the preferred-primary node returns]. The DRBD resources can be moved manually...

> crm resource move p_IP-1_254 san1
> crm resource move p_IP-1_253 san2

...in order to distribute the workload between san1 & san2.

Thoughts? Suggestions?

Eric Pretorious
Truckee, CA

  From: Eric <epretorious@xxxxxxxxx>
 To: linux clustering <linux-cluster@xxxxxxxxxx> 
 Sent: Friday, January 18, 2013 12:40 PM
 Subject: Re:  [RESOLVED] HA iSCSI with DRBD

After rebooting both nodes, I checked the cluster status again and found this:
Code:

> san1:~ # crm_mon -1
> ============
> Last updated: Fri Jan 18 11:51:28 2013
> Last change: Fri Jan 18 09:00:03 2013 by root via cibadmin on san2
> Stack: openais
> Current DC: san2 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 9 Resources configured.
> ============
> 
> Online: [ san1 san2 ]
> 
>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>      Masters: [ san2 ]
>      Slaves: [ san1 ]
>  Resource Group: g_iSCSI-san1
>     
 p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san2
>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san1_4    (ocf::heartbeat:iSCSILogicalUnit):    Stopped 
>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Stopped 
> 
> Failed actions:
>    
 p_iSCSI-san1_4_start_0 (node=san1, call=25, rc=1, status=complete): unknown error
>     p_iSCSI-san1_4_start_0 (node=san2, call=30, rc=1, status=complete): unknown error

...and that's when it occured to me: There are only four volumes defined in the DRBD cofiguration (0, 1, 2, & 3) - not five (0, 1, 2, 3, & 4)! i.e., The p_iSCSI-san1_4 primitive was failing (because there is no volume /dev/drbd4) and that, in turn, was holding up theresource group g_iSCSI-san1 and causing all of the other primitives [e.g., p_IP-1_254] to fail too!

So, I deleted p_iSCSI-san1_4 from the CIB and the cluster began working as designed:

> san2:~ # ll /dev/drbd*
> brw-rw---- 1 root disk 147, 0 Jan 18 11:47 /dev/drbd0
> brw-rw---- 1 root disk 147, 1 Jan 18 11:47 /dev/drbd1
> brw-rw---- 1 root disk 147, 2 Jan 18 11:47 /dev/drbd2
> brw-rw---- 1 root disk 147, 3 Jan 18 11:47 /dev/drbd3
>

> ...
> 
> san2:~ # crm_mon -1
> ============
> Last updated: Fri Jan 18 11:53:03 2013
> Last change: Fri Jan 18 11:52:58 2013 by root via cibadmin on san2
> Stack: openais
> Current DC: san2 - partition with quorum
> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 8 Resources configured.
> ============
> 
> Online: [ san1 san2 ]
> 
>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>      Masters: [ san2
 ]
>      Slaves: [ san1 ]
>  Resource Group: g_iSCSI-san1
>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san2
>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Started san2

From the iSCSI client (xen2):

> xen2:~ #
 iscsiadm -m discovery -t st -p 192.168.1.254
> 192.168.1.254:3260,1 iqn.2012-11.com.example.san1:sda
> 192.168.0.2:3260,1 iqn.2012-11.com.example.san1:sda
> 192.168.1.2:3260,1 iqn.2012-11.com.example.san1:sda

Problem fixed!

Eric Pretorious
Truckee, CA

  From: Eric <epretorious@xxxxxxxxx>
 To: linux clustering <linux-cluster@xxxxxxxxxx> 
 Sent: Thursday, January 17, 2013 8:59 PM
 Subject:  HA iSCSI with DRBD

I've been attempting to follow the recipe laid-out in the Linbit guide "Highly available iSCSI storage with DRBD and Pacemaker" to create a highly-available iSCSI server on the two servers san1 & san2 but can't quite get the details right:

> crm configure property stonith-enabled=false
> crm configure property no-quorum-policy=ignore
> 
> crm configure primitive p_IP-1_254 ocf:heartbeat:IPaddr2 params ip=192.168.1.254 cidr_netmask=24 op monitor interval=30s
> 
> crm configure primitive p_DRBD-r0 ocf:linbit:drbd params drbd_resource=r0 op monitor interval=60s
> crm configure
 ms
 ms_DRBD-r0 p_DRBD-r0 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
> 
> crm configure primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget params iqn=iqn.2012-11.com.example.san1:sda op monitor interval=10s
> crm configure primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=0 path=/dev/drbd0 op monitor interval=10s
> crm configure primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=1 path=/dev/drbd1 op monitor interval=10s
> crm configure primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=2 path=/dev/drbd2 op monitor interval=10s
> crm configure primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=3 path=/dev/drbd3 op monitor interval=10s
> crm configure
 primitive p_iSCSI-san1_4 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=4 path=/dev/drbd4 op monitor interval=10s
> 
> crm configure group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_iSCSI-san1_4 p_IP-1_254
> crm configure order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start
> crm configure colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master
> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1

IET (i.e., iscsitarget) is already running (with
 the default configuration) and DRBD's already correctly configured to create the resource r0...

> resource r0 {
>     volume 0 {
>         device /dev/drbd0 ;
>         disk /dev/sda7 ;
>         meta-disk internal ;
>     }
>     volume 1 {
>         device /dev/drbd1 ;
>         disk /dev/sda8 ;
>         meta-disk internal
 ;
>     }
>     volume 2 {
>         device /dev/drbd2 ;
>         disk /dev/sda9 ;
>         meta-disk internal ;
>     }
>     volume 3 {
>         device /dev/drbd3 ;
>         disk /dev/sda10 ;
>         meta-disk internal ;
>     }
>     on san1 {
>         address 192.168.1.1:7789 ;
>     }
>     on san2 {
>         address 192.168.1.2:7789 ;
>     }
> }

But the shared IP address won't start nor will the LUN's:

> san1:~ # crm_mon -1
> ============
> Last updated: Thu Jan 17 20:55:55 2013
> Last change: Thu Jan 17 20:55:09 2013 by root via cibadmin on san1
> Stack: openais
> Current DC: san1 - partition with quorum
> Version:
 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
> 2 Nodes configured, 2 expected votes
> 9 Resources configured.
> ============
> 
> Online: [ san1 san2 ]
> 
>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>      Masters: [ san1 ]
>      Slaves: [ san2 ]
>  Resource Group: g_iSCSI-san1
>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san1
>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Stopped 
>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Stopped 
>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Stopped 
>      p_iSCSI-san1_3   
 (ocf::heartbeat:iSCSILogicalUnit):    Stopped 
>      p_iSCSI-san1_4    (ocf::heartbeat:iSCSILogicalUnit):    Stopped 
>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Stopped 
> 
> Failed actions:
>     p_iSCSI-san1_0_start_0 (node=san1, call=23, rc=1, status=complete): unknown error
>     p_iSCSI-san1_1_start_0 (node=san1, call=26, rc=1, status=complete): unknown error
>     p_iSCSI-san1_2_start_0 (node=san1, call=29, rc=1, status=complete): unknown error
>     p_iSCSI-san1_3_start_0 (node=san1, call=32, rc=1, status=complete): unknown error
>     p_iSCSI-san1_4_start_0 (node=san1, call=35, rc=1, status=complete): unknown error
>     p_iSCSI-san1_0_start_0 (node=san2, call=11, rc=1,
 status=complete): unknown error
>     p_iSCSI-san1_1_start_0 (node=san2, call=14, rc=1, status=complete): unknown error
>     p_iSCSI-san1_2_start_0 (node=san2, call=17, rc=1, status=complete): unknown error
>     p_iSCSI-san1_3_start_0 (node=san2, call=20, rc=1, status=complete): unknown error
>     p_iSCSI-san1_4_start_0 (node=san2, call=23, rc=1, status=complete): unknown error

What am I doing wrong?

TIA,
Eric Pretorious
Truckee, CA

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster