Very interesting documentation about this subject is here: http://docs.ceph.com/docs/hammer/rados/configuration/mon-osd-interaction/
2016-12-22 12:26 GMT+01:00 Stéphane Klein <contact@xxxxxxxxxxxxxxxxxxx>:
When I shutdown one osd, I work great:* 3 osd* 3 monHi,I have:
cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac
health HEALTH_WARN
43 pgs degraded
43 pgs stuck unclean
43 pgs undersized
recovery 24/70 objects degraded (34.286%)
too few PGs per OSD (28 < min 30)
1/3 in osds are down
monmap e1: 3 mons at {ceph-mon-1=172.28.128.2:6789/0,ceph-mon-2=172.28.128.3: }6789/0,ceph-mon-3=172.28.128. 4:6789/0
election epoch 10, quorum 0,1,2 ceph-mon-1,ceph-mon-2,ceph-mon-3
osdmap e22: 3 osds: 2 up, 3 in; 43 remapped pgs
flags sortbitwise,require_jewel_osds
pgmap v169: 64 pgs, 1 pools, 77443 kB data, 35 objects
252 MB used, 1484 GB / 1484 GB avail
24/70 objects degraded (34.286%)
43 active+undersized+degraded
21 active+cleanBut, when I shutdown 2 osd, Ceph Cluster don't see that second osd node is down :(
root@ceph-mon-1:/home/vagrant# ceph status
cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac
health HEALTH_WARN
clock skew detected on mon.ceph-mon-2
pauserd,pausewr,sortbitwise,require_jewel_osds flag(s) set
Monitor clock skew detected
monmap e1: 3 mons at {ceph-mon-1=172.28.128.2:6789/0,ceph-mon-2=172.28.128.3: }6789/0,ceph-mon-3=172.28.128. 4:6789/0
election epoch 10, quorum 0,1,2 ceph-mon-1,ceph-mon-2,ceph-mon-3
osdmap e26: 3 osds: 2 up, 2 in
flags pauserd,pausewr,sortbitwise,require_jewel_osds
pgmap v203: 64 pgs, 1 pools, 77443 kB data, 35 objects
219 MB used, 989 GB / 989 GB avail
64 active+clean2 osd up ! why ?
root@ceph-mon-1:/home/vagrant# ping ceph-osd-1 -c1
--- ceph-osd-1 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
root@ceph-mon-1:/home/vagrant# ping ceph-osd-2 -c1
--- ceph-osd-2 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
root@ceph-mon-1:/home/vagrant# ping ceph-osd-3 -c1
--- ceph-osd-3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.278/0.278/0.278/0.000 msMy configuration:
ceph_conf_overrides:
global:
osd_pool_default_size: 2
osd_pool_default_min_size: 1Full Ansible configuration is here: https://github.com/harobed/poc-ceph-ansible/blob/master/ vagrant-3mons-3osd/hosts/ group_vars/all.yml#L11 What is my mistake? Is it Ceph bug?Best regards,Stéphane
--Stéphane Klein <contact@xxxxxxxxxxxxxxxxxxx>
blog: http://stephane-klein.info
cv : http://cv.stephane-klein.info
Twitter: http://twitter.com/klein_stephane
--
Stéphane Klein <contact@xxxxxxxxxxxxxxxxxxx>
blog: http://stephane-klein.info
cv : http://cv.stephane-klein.info
Twitter: http://twitter.com/klein_stephane
blog: http://stephane-klein.info
cv : http://cv.stephane-klein.info
Twitter: http://twitter.com/klein_stephane
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com