cluster request failed: Host is down

Jacek Konieczny <jajcus@jajcus.net> · Fri, 16 Nov 2012 13:48:09 +0100

Hi,

I have seen this problem already reported here, but with no useful
answer:

http://osdir.com/ml/linux-lvm/2011-01/msg00038.html

This post suggest it is some very old bug, a change which can be easily
reverted… though, it is a bit hard to believe. Such an easy bug, would
be already fixed, wouldn't it?

For me the problem is as follows:

I have a two node cluster with a volume group running on a DRBD in
Master-Master setup. When I shut one node down, cleanly, I am not able
to properly manage the volumes. 

LVs which are active on the surviving host remain active, but I am not
able to deactivate them or activate more volumes:

>  [root@dev1n1 ~]# lvs dev1_vg/4bwM2m7oVL
>    cluster request failed: Host is down
>    LV         VG        Attr      LSize Pool Origin Data%  Move Log Copy%  Convert
>    4bwM2m7oVL dev1_vg -wi------ 1.00g                                           
>  [root@dev1n1 ~]# lvchange -aey dev1_vg/XaMS0LyAq8 ; echo $?
>    cluster request failed: Host is down
>    cluster request failed: Host is down
>    cluster request failed: Host is down
>    cluster request failed: Host is down
>    cluster request failed: Host is down
>  5
>  [root@dev1n1 ~]# lvs dev1_vg/4bwM2m7oVL
>    cluster request failed: Host is down
>    LV         VG        Attr      LSize Pool Origin Data%  Move Log Copy%  Convert
>    4bwM2m7oVL dev1_vg -wi------ 1.00g                                           
>  [root@dev1n1 ~]# lvchange -aen dev1_vg/XaMS0LyAq8 ; echo $?
>    cluster request failed: Host is down
>    cluster request failed: Host is down
>  5
>  [root@dev1n1 ~]# lvs dev1_vg/XaMS0LyAq8
>    cluster request failed: Host is down
>    LV         VG        Attr      LSize Pool Origin Data%  Move Log Copy%  Convert
>    XaMS0LyAq8 dev1_vg -wi-a---- 1.00g                                           
>  
>  [root@dev1n1 ~]# dlm_tool ls
>  dlm lockspaces
>  name          clvmd
>  id            0x4104eefa
>  flags         0x00000000 
>  change        member 1 joined 0 remove 1 failed 0 seq 2,2
>  members       1 
>  
>  [root@dev1n1 ~]# dlm_tool status
>  cluster nodeid 1 quorate 1 ring seq 30648 30648
>  daemon now 1115 fence_pid 0 
>  node 1 M add 15 rem 0 fail 0 fence 0 at 0 0
>  node 2 X add 15 rem 184 fail 0 fence 0 at 0 0

The node has cleanly left the lockspace and the cluster. DLM is aware
about that, so should be clvmd, right? And if all other cluster nodes
(only one here) are clean, all LVM operations on the clustered VG should
work, right? Or am I missing something?

The behaviour is exactly the same when I power off a running node. It
is fenced by dlm_tool, as expected and then the VG is non-functional as
above, until the dead node is up again and joins the cluster.

Is this the expected behaviour or is it a bug?

Greets,
        Jacek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/