Hi
I spoke too soon. I do have issues with the opennebula leader node MON or OSD taking up the floating ip and then going down.
The confusing part is, it doesnt happen on every reboot or leader node transfer. Don't know how to go about this.
Will have to test it some more to find a pattern.
Will get back after the weekend.
Regards,
Rahul
On 29 June 2018 at 10:37, Дробышевский, Владимир <vlad@xxxxxxxxxx> wrote:
Rahul,if you are using the whole drives for OSDs then ceph-deploy is a good option in most cases.2018-06-28 18:12 GMT+05:00 Rahul S <saple.rahul.eightythree@gmail.com >:Hi Vlad,Have not thoroughly tested my setup but so far things look good. Only problem is that I have to manually activate the osd's using the ceph-deploy command. Manually mounting the osd partition doesnt work.Thanks for replying.Regards,Rahul SOn 27 June 2018 at 14:15, Дробышевский, Владимир <vlad@xxxxxxxxxx> wrote:Hello, Rahul!Do you have your problem during initial cluster creation or on any reboot\leadership transfer? If the first then try to remove floating IP while creating mons and temporarily transfer the leadership from the server your going to create OSD on.We are using the same configuration without any issues (though have a little bit more servers) but ceph cluster had been created before OpenNebula setup.We have a number of physical\virtual interfaces on top of IPoIB _and_ ethernet network (with bonding).So there are 3 interfaces for the internal communications:ib0.8003 - 10.103.0.0/16 - ceph public network and opennebula raft virtual ipib0.8004 - 10.104.0.0/16 - ceph cluster networkbr0 (on top of ethernet bonding interface) - 10.101.0.0/16 - physical "management" networkalso we have a number of other virtual interfaces for per-tenant intra-VM networks (vxlan on top of IP) and so on.in /etc/hosts we have only "fixed" IPs from 10.103.0.0/16 networks like:10.103.0.1 e001n01.dc1.xxxxxxxx.xx e001n01/etc/one/oned.conf:# Executed when a server transits from follower->leaderRAFT_LEADER_HOOK = [COMMAND = "raft/vip.sh",ARGUMENTS = "leader ib0.8003 10.103.255.254/16"]# Executed when a server transits from leader->followerRAFT_FOLLOWER_HOOK = [COMMAND = "raft/vip.sh",ARGUMENTS = "follower ib0.8003 10.103.255.254/16"]/etc/ceph/ceph.conf:[global]public_network = 10.103.0.0/16cluster_network = 10.104.0.0/16mon_initial_members = e001n01, e001n02, e001n03mon_host = 10.103.0.1,10.103.0.2,10.103.0.3 Cluster and mons created with ceph-deploy, each OSD has been added via modified ceph-disk.py (as we have only 3 drive slots per server we had to co-locate system partition with OSD partition on our SSDs) on per-host\drive manner:admin@<host>:~$ sudo ./ceph-disk-mod.py -v prepare --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --bluestore --cluster ceph --fs-type xfs -- /dev/sdaAnd the current state on the leader:oneadmin@e001n02:~/remotes/tm$ onezone show 0ZONE 0 INFORMATIONID : 0NAME : OpenNebulaZONE SERVERSID NAME ENDPOINT0 e001n01 http://10.103.0.1:2633/RPC21 e001n02 http://10.103.0.2:2633/RPC22 e001n03 http://10.103.0.3:2633/RPC2HA & FEDERATION SYNC STATUSID NAME STATE TERM INDEX COMMIT VOTE FED_INDEX0 e001n01 follower 1571 68250418 68250417 1 -11 e001n02 leader 1571 68250418 68250418 1 -12 e001n03 follower 1571 68250418 68250417 -1 -1...admin@e001n02:~$ ip addr show ib0.80039: ib0.8003@ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc mq state UP group default qlen 256 link/infiniband a0:00:03:00:fe:80:00:00:00:00:00:00:00:1e:67:03:00:47:c1:1b brd 00:ff:ff:ff:ff:12:40:1b:80:03: 00:00:00:00:00:00:ff:ff:ff:ff inet 10.103.0.2/16 brd 10.103.255.255 scope global ib0.8003valid_lft forever preferred_lft foreverinet 10.103.255.254/16 scope global secondary ib0.8003valid_lft forever preferred_lft foreverinet6 fe80::21e:6703:47:c11b/64 scope linkvalid_lft forever preferred_lft foreveradmin@e001n02:~$ sudo netstat -anp | grep montcp 0 0 10.103.0.2:6789 0.0.0.0:* LISTEN 168752/ceph-montcp 0 0 10.103.0.2:6789 10.103.0.2:44270 ESTABLISHED 168752/ceph-mon...admin@e001n02:~$ sudo netstat -anp | grep osdtcp 0 0 10.104.0.2:6800 0.0.0.0:* LISTEN 6736/ceph-osdtcp 0 0 10.104.0.2:6801 0.0.0.0:* LISTEN 6736/ceph-osdtcp 0 0 10.103.0.2:6801 0.0.0.0:* LISTEN 6736/ceph-osdtcp 0 0 10.103.0.2:6802 0.0.0.0:* LISTEN 6736/ceph-osdtcp 0 0 10.104.0.2:6801 10.104.0.6:42868 ESTABLISHED 6736/ceph-osdtcp 0 0 10.104.0.2:51788 10.104.0.1:6800 ESTABLISHED 6736/ceph-osd...admin@e001n02:~$ sudo ceph -scluster:id: <uuid>health: HEALTH_OKoneadmin@e001n02:~/remotes/tm$ onedatastore show 0DATASTORE 0 INFORMATIONID : 0NAME : systemUSER : oneadminGROUP : oneadminCLUSTERS : 0TYPE : SYSTEMDS_MAD : -TM_MAD : ceph_sharedBASE PATH : /var/lib/one//datastores/0DISK_TYPE : RBDSTATE : READY...DATASTORE TEMPLATEALLOW_ORPHANS="YES"BRIDGE_LIST="e001n01 e001n02 e001n03"CEPH_HOST="e001n01 e001n02 e001n03"CEPH_SECRET="secret_uuid"CEPH_USER="libvirt"DEFAULT_DEVICE_PREFIX="sd"DISK_TYPE="RBD"DS_MIGRATE="NO"POOL_NAME="rbd-ssd"RESTRICTED_DIRS="/"SAFE_DIRS="/mnt"SHARED="YES"TM_MAD="ceph_shared"TYPE="SYSTEM_DS"...oneadmin@e001n02:~/remotes/tm$ onedatastore show 1DATASTORE 1 INFORMATIONID : 1NAME : defaultUSER : oneadminGROUP : oneadminCLUSTERS : 0TYPE : IMAGEDS_MAD : cephTM_MAD : ceph_sharedBASE PATH : /var/lib/one//datastores/1DISK_TYPE : RBDSTATE : READY...DATASTORE TEMPLATEALLOW_ORPHANS="YES"BRIDGE_LIST="e001n01 e001n02 e001n03"CEPH_HOST="e001n01 e001n02 e001n03"CEPH_SECRET="secret_uuid"CEPH_USER="libvirt"CLONE_TARGET="SELF"DISK_TYPE="RBD"DRIVER="raw"DS_MAD="ceph"LN_TARGET="NONE"POOL_NAME="rbd-ssd"SAFE_DIRS="/mnt /var/lib/one/datastores/tmp"STAGING_DIR="/var/lib/one/datastores/tmp" TM_MAD="ceph_shared"TYPE="IMAGE_DS"IMAGES...Leadership transfers without any issues as well.BR2018-06-26 13:17 GMT+05:00 Rahul S <saple.rahul.eightythree@gmail.com >:______________________________Hi! In my organisation we are using OpenNebula as our Cloud Platform. Currently we are testing High Availability(HA) feature with Ceph Cluster as our storage backend. In our test setup we have 3 systems with front-end HA already successfully setup and configured with a floating IP in between them. We are having our ceph cluster(3 osds and 3 mons) on these very 3 machines. However, when we try to deploy a ceph cluster, we have a successful quorum with the following issues on the OpenNebula 'LEADER' node
1) The mon daemon successfully starts, but takes up the floating IP rather than the actual IP.
2) The osd daemon on the other hand goes down after a while giving an error
log_channel(cluster) log [ERR] : map e29 had wrong cluster addr (192.x.x.20:6801/10821 != my 192.x.x.245:6801/10821)
192.x.x.20 being the floating ip
192.x.x.245 being the actual ip
Apart from that, we are getting HEALTH_WARN status on running ceph -s, with many pgs in a degraded, unclean, undersized state
Also, if that matters, we have our osds on a seperate partition rather than a disk.We only need to get the cluster in a healthy state in our minimalistic setup. Any idea on how to get past this?Thanks and Regards,Rahul S_________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
С уважением,
Дробышевский Владимир
Компания "АйТи Город"
+7 343 2222192
ИТ-консалтинг
Поставка проектов "под ключ"
Аутсорсинг ИТ-услуг
Аутсорсинг ИТ-инфраструктуры--
С уважением,
Дробышевский Владимир
Компания "АйТи Город"
+7 343 2222192
ИТ-консалтинг
Поставка проектов "под ключ"
Аутсорсинг ИТ-услуг
Аутсорсинг ИТ-инфраструктуры
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com