Re: If I shutdown 2 osd / 3, Ceph Cluster say 2 osd UP, why?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16-12-22 13:26, Stéphane Klein wrote:
Hi,

I have:

* 3 mon
* 3 osd

When I shutdown one osd, I work great:

    cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac
     health HEALTH_WARN
            43 pgs degraded
            43 pgs stuck unclean
            43 pgs undersized
            recovery 24/70 objects degraded (34.286%)
            too few PGs per OSD (28 < min 30)
            1/3 in osds are down
     monmap e1: 3 mons at {ceph-mon-1=172.28.128.2:6789/0,ceph-mon-2=172.28.128.3:6789/0,ceph-mon-3=172.28.128.4:6789/0}
            election epoch 10, quorum 0,1,2 ceph-mon-1,ceph-mon-2,ceph-mon-3
     osdmap e22: 3 osds: 2 up, 3 in; 43 remapped pgs
            flags sortbitwise,require_jewel_osds
      pgmap v169: 64 pgs, 1 pools, 77443 kB data, 35 objects
            252 MB used, 1484 GB / 1484 GB avail
            24/70 objects degraded (34.286%)
                  43 active+undersized+degraded
                  21 active+clean

But, when I shutdown 2 osd, Ceph Cluster don't see that second osd node is down :(

root@ceph-mon-1:/home/vagrant# ceph status
    cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac
     health HEALTH_WARN
            clock skew detected on mon.ceph-mon-2
            pauserd,pausewr,sortbitwise,require_jewel_osds flag(s) set
            Monitor clock skew detected
     monmap e1: 3 mons at {ceph-mon-1=172.28.128.2:6789/0,ceph-mon-2=172.28.128.3:6789/0,ceph-mon-3=172.28.128.4:6789/0}
            election epoch 10, quorum 0,1,2 ceph-mon-1,ceph-mon-2,ceph-mon-3
     osdmap e26: 3 osds: 2 up, 2 in
            flags pauserd,pausewr,sortbitwise,require_jewel_osds
      pgmap v203: 64 pgs, 1 pools, 77443 kB data, 35 objects
            219 MB used, 989 GB / 989 GB avail
                  64 active+clean

2 osd up ! why ?

root@ceph-mon-1:/home/vagrant# ping ceph-osd-1 -c1
--- ceph-osd-1 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

root@ceph-mon-1:/home/vagrant# ping ceph-osd-2 -c1
--- ceph-osd-2 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

root@ceph-mon-1:/home/vagrant# ping ceph-osd-3 -c1
--- ceph-osd-3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.278/0.278/0.278/0.000 ms

My configuration:

ceph_conf_overrides:
   global:
      osd_pool_default_size: 2
      osd_pool_default_min_size: 1

What is my mistake? Is it Ceph bug?

try waiting a little longer. Mon needs multiple down reports to take OSD down. And as your cluster is very small there is small amount (1 in this case) of OSDs to report that others are down.



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux