Re: If I shutdown 2 osd / 3, Ceph Cluster say 2 osd UP, why?

Stéphane Klein <contact@xxxxxxxxxxxxxxxxxxx> · Fri, 23 Dec 2016 10:37:39 +0100

Very interesting documentation about this subject is here: http://docs.ceph.com/docs/hammer/rados/configuration/mon-osd-interaction/

2016-12-22 12:26 GMT+01:00 Stéphane Klein <contact@xxxxxxxxxxxxxxxxxxx>:
Hi,

I have:

* 3 mon
* 3 osd

When I shutdown one osd, I work great:

    cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac
     health HEALTH_WARN
            43 pgs degraded
            43 pgs stuck unclean
            43 pgs undersized
            recovery 24/70 objects degraded (34.286%)
            too few PGs per OSD (28 < min 30)
            1/3 in osds are down
     monmap e1: 3 mons at {ceph-mon-1=172.28.128.2:6789/0,ceph-mon-2=172.28.128.3:6789/0,ceph-mon-3=172.28.128.4:6789/0}
            election epoch 10, quorum 0,1,2 ceph-mon-1,ceph-mon-2,ceph-mon-3
     osdmap e22: 3 osds: 2 up, 3 in; 43 remapped pgs
            flags sortbitwise,require_jewel_osds
      pgmap v169: 64 pgs, 1 pools, 77443 kB data, 35 objects
            252 MB used, 1484 GB / 1484 GB avail
            24/70 objects degraded (34.286%)
                  43 active+undersized+degraded
                  21 active+clean

But, when I shutdown 2 osd, Ceph Cluster don't see that second osd node is down :(

root@ceph-mon-1:/home/vagrant# ceph status
    cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac
     health HEALTH_WARN
            clock skew detected on mon.ceph-mon-2
            pauserd,pausewr,sortbitwise,require_jewel_osds flag(s) set
            Monitor clock skew detected
     monmap e1: 3 mons at {ceph-mon-1=172.28.128.2:6789/0,ceph-mon-2=172.28.128.3:6789/0,ceph-mon-3=172.28.128.4:6789/0}
            election epoch 10, quorum 0,1,2 ceph-mon-1,ceph-mon-2,ceph-mon-3
     osdmap e26: 3 osds: 2 up, 2 in
            flags pauserd,pausewr,sortbitwise,require_jewel_osds
      pgmap v203: 64 pgs, 1 pools, 77443 kB data, 35 objects
            219 MB used, 989 GB / 989 GB avail
                  64 active+clean

2 osd up ! why ?

root@ceph-mon-1:/home/vagrant# ping ceph-osd-1 -c1
--- ceph-osd-1 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

root@ceph-mon-1:/home/vagrant# ping ceph-osd-2 -c1
--- ceph-osd-2 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

root@ceph-mon-1:/home/vagrant# ping ceph-osd-3 -c1
--- ceph-osd-3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.278/0.278/0.278/0.000 ms

My configuration:

ceph_conf_overrides:
   global:
      osd_pool_default_size: 2
      osd_pool_default_min_size: 1

Full Ansible configuration is here: https://github.com/harobed/poc-ceph-ansible/blob/master/vagrant-3mons-3osd/hosts/group_vars/all.yml#L11

What is my mistake? Is it Ceph bug?

Best regards,
Stéphane
-- 
Stéphane Klein <contact@xxxxxxxxxxxxxxxxxxx>
blog: http://stephane-klein.info
cv : http://cv.stephane-klein.info
Twitter: http://twitter.com/klein_stephane

-- 
Stéphane Klein <contact@xxxxxxxxxxxxxxxxxxx>
blog: http://stephane-klein.info
cv : http://cv.stephane-klein.info
Twitter: http://twitter.com/klein_stephane

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com