Re: In a High Avaiability setup, MON, OSD daemon take up the floating IP

Дробышевский, Владимир <vlad@xxxxxxxxxx> · Wed, 27 Jun 2018 13:45:58 +0500

Hello, Rahul!

  Do you have your problem during initial cluster creation or on any reboot\leadership transfer? If the first then try to remove floating IP while creating mons and temporarily transfer the leadership from the server your going to create OSD on.

  We are using the same configuration without any issues (though have a little bit more servers) but ceph cluster had been created before OpenNebula setup.

  We have a number of physical\virtual interfaces on top of IPoIB _and_ ethernet network (with bonding).

  So there are 3 interfaces for the internal communications:

  ib0.8003 - 10.103.0.0/16 - ceph public network and opennebula raft virtual ip

  ib0.8004 - 10.104.0.0/16 - ceph cluster network
  br0 (on top of ethernet bonding interface) - 10.101.0.0/16 - physical "management" network

  also we have a number of other virtual interfaces for per-tenant 

intra-VM networks (vxlan on top of IP) and so on.

in /etc/hosts we have only "fixed" IPs from 10.103.0.0/16 networks like:

10.103.0.1      e001n01.dc1.xxxxxxxx.xx        e001n01

  /etc/one/oned.conf:

# Executed when a server transits from follower->leader
 RAFT_LEADER_HOOK = [
     COMMAND = "raft/vip.sh",
     ARGUMENTS = "leader ib0.8003 10.103.255.254/16"
 ]

# Executed when a server transits from leader->follower
 RAFT_FOLLOWER_HOOK = [
     COMMAND = "raft/vip.sh",
     ARGUMENTS = "follower ib0.8003 10.103.255.254/16"
 ]

  /etc/ceph/ceph.conf:

[global]
public_network = 10.103.0.0/16
cluster_network = 10.104.0.0/16

mon_initial_members = e001n01, e001n02, e001n03
mon_host = 10.103.0.1,10.103.0.2,10.103.0.3

  Cluster and mons created with ceph-deploy, each OSD has been added via modified ceph-disk.py (as we have only 3 drive slots per server we had to co-locate system partition with OSD partition on our SSDs) on per-host\drive manner:

admin@<host>:~$ sudo ./ceph-disk-mod.py -v prepare --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --bluestore --cluster ceph --fs-type xfs -- /dev/sda

  And the current state on the leader:

oneadmin@e001n02:~/remotes/tm$ onezone show 0
ZONE 0 INFORMATION
ID                : 0
NAME              : OpenNebula

ZONE SERVERS
ID NAME            ENDPOINT
 0 e001n01         http://10.103.0.1:2633/RPC2
 1 e001n02         http://10.103.0.2:2633/RPC2
 2 e001n03         http://10.103.0.3:2633/RPC2

HA & FEDERATION SYNC STATUS
ID NAME            STATE      TERM       INDEX      COMMIT     VOTE  FED_INDEX
 0 e001n01         follower   1571       68250418   68250417   1     -1
 1 e001n02         leader     1571       68250418   68250418   1     -1
 2 e001n03         follower   1571       68250418   68250417   -1    -1
...

admin@e001n02:~$ ip addr show ib0.8003
9: ib0.8003@ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc mq state UP group default qlen 256
    link/infiniband a0:00:03:00:fe:80:00:00:00:00:00:00:00:1e:67:03:00:47:c1:1b brd 00:ff:ff:ff:ff:12:40:1b:80:03:00:00:00:00:00:00:ff:ff:ff:ff
    inet 10.103.0.2/16 brd 10.103.255.255 scope global ib0.8003
       valid_lft forever preferred_lft forever
    inet 10.103.255.254/16 scope global secondary ib0.8003
       valid_lft forever preferred_lft forever
    inet6 fe80::21e:6703:47:c11b/64 scope link
       valid_lft forever preferred_lft forever

admin@e001n02:~$ sudo netstat -anp | grep mon
tcp        0      0 10.103.0.2:6789         0.0.0.0:*               LISTEN      168752/ceph-mon
tcp        0      0 10.103.0.2:6789         10.103.0.2:44270        ESTABLISHED 168752/ceph-mon
...

admin@e001n02:~$ sudo netstat -anp | grep osd
tcp        0      0 10.104.0.2:6800         0.0.0.0:*               LISTEN      6736/ceph-osd
tcp        0      0 10.104.0.2:6801         0.0.0.0:*               LISTEN      6736/ceph-osd
tcp        0      0 10.103.0.2:6801         0.0.0.0:*               LISTEN      6736/ceph-osd
tcp        0      0 10.103.0.2:6802         0.0.0.0:*               LISTEN      6736/ceph-osd
tcp        0      0 10.104.0.2:6801         10.104.0.6:42868        ESTABLISHED 6736/ceph-osd
tcp        0      0 10.104.0.2:51788        10.104.0.1:6800         ESTABLISHED 6736/ceph-osd
...

admin@e001n02:~$ sudo ceph -s
  cluster:
    id:     <uuid>
    health: HEALTH_OK

oneadmin@e001n02:~/remotes/tm$ onedatastore show 0
DATASTORE 0 INFORMATION
ID             : 0
NAME           : system
USER           : oneadmin
GROUP          : oneadmin
CLUSTERS       : 0
TYPE           : SYSTEM
DS_MAD         : -
TM_MAD         : ceph_shared
BASE PATH      : /var/lib/one//datastores/0
DISK_TYPE      : RBD
STATE          : READY

...

DATASTORE TEMPLATE
ALLOW_ORPHANS="YES"
BRIDGE_LIST="e001n01 e001n02 e001n03"
CEPH_HOST="e001n01 e001n02 e001n03"
CEPH_SECRET="secret_uuid"
CEPH_USER="libvirt"
DEFAULT_DEVICE_PREFIX="sd"
DISK_TYPE="RBD"
DS_MIGRATE="NO"
POOL_NAME="rbd-ssd"
RESTRICTED_DIRS="/"
SAFE_DIRS="/mnt"
SHARED="YES"
TM_MAD="ceph_shared"
TYPE="SYSTEM_DS"

...

oneadmin@e001n02:~/remotes/tm$ onedatastore show 1
DATASTORE 1 INFORMATION
ID             : 1
NAME           : default
USER           : oneadmin
GROUP          : oneadmin
CLUSTERS       : 0
TYPE           : IMAGE
DS_MAD         : ceph
TM_MAD         : ceph_shared
BASE PATH      : /var/lib/one//datastores/1
DISK_TYPE      : RBD
STATE          : READY

...

DATASTORE TEMPLATE
ALLOW_ORPHANS="YES"
BRIDGE_LIST="e001n01 e001n02 e001n03"
CEPH_HOST="e001n01 e001n02 e001n03"
CEPH_SECRET="secret_uuid"
CEPH_USER="libvirt"
CLONE_TARGET="SELF"
DISK_TYPE="RBD"
DRIVER="raw"
DS_MAD="ceph"
LN_TARGET="NONE"
POOL_NAME="rbd-ssd"
SAFE_DIRS="/mnt /var/lib/one/datastores/tmp"
STAGING_DIR="/var/lib/one/datastores/tmp"
TM_MAD="ceph_shared"
TYPE="IMAGE_DS"

IMAGES
...

Leadership transfers without any issues as well.

BR

2018-06-26 13:17 GMT+05:00 Rahul S <saple.rahul.eightythree@xxxxxxxxx>:
Hi! In my organisation we are using OpenNebula as our Cloud Platform. Currently we are testing High Availability(HA) feature with Ceph Cluster as our storage backend. In our test setup we have 3 systems with front-end HA already successfully setup and configured with a floating IP in between them. We are having our ceph cluster(3 osds and 3 mons) on these very 3 machines. However, when we try to deploy a ceph cluster, we have a successful quorum with the following issues on the OpenNebula 'LEADER' node

    1) The mon daemon successfully starts, but takes up the floating IP rather than the actual IP. 

    2) The osd daemon on the other hand goes down after a while giving an error
    log_channel(cluster) log [ERR] : map e29 had wrong cluster addr (192.x.x.20:6801/10821 != my 192.x.x.245:6801/10821) 
    192.x.x.20 being the floating ip
    192.x.x.245 being the actual ip

Apart from that, we are getting HEALTH_WARN status on running ceph -s, with many pgs in a degraded, unclean, undersized state

Also, if that matters, we have our osds on a seperate partition rather than a disk.

We only need to get the cluster in a healthy state in our minimalistic setup. Any idea on how to get past this?

Thanks and Regards,
Rahul S

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

С уважением,
Дробышевский Владимир
Компания "АйТи Город"
+7 343 2222192

ИТ-консалтинг
Поставка проектов "под ключ"
Аутсорсинг ИТ-услуг
Аутсорсинг ИТ-инфраструктуры

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com