OSD mystery

Dan Koren <dnk@xxxxxxxxxxxxx> · Mon, 31 Mar 2014 12:44:00 -0700

On a 4 node cluster (admin + 3 mon/osd nodes) I see the following shortly

after rebooting the cluster and waiting for a couple of minutes:

root@rts23:~# ps -ef | grep ceph && ceph osd tree

root       4183      1  0 12:09 ?        00:00:00 /usr/bin/ceph-mon --cluster=ceph -i rts23 -f
root       5771   5640  0 12:30 pts/0    00:00:00 grep --color=auto ceph
# id    weight  type name       up/down reweight

-1      0.94    root default
-2      0.31            host rts22
0       0.31                    osd.0   down    0

-3      0.31            host rts21
1       0.31                    osd.1   up      1
-4      0.32            host rts23
2       0.32                    osd.2   up      1

It seems rather odd that ceph reports 2 OSDs up while ps does not show

any OSD daemons running (ceph osd tree output is the same on all 4 nodes).

ceph status shows:

root@rts23:~# ceph status
    cluster 6149cebd-b619-4709-9fec-00fd8bc210a3
     health HEALTH_WARN 192 pgs degraded; 192 pgs stale; 192 pgs stuck stale; 192 pgs 

stuck unclean; recovery 10242/20484 objects degraded (50.000%); 2/2 in osds are down; 

clock skew detected on mon.rts23
     monmap e1: 3 mons at {rts21=172.29.0.21:6789/0,rts22=172.29.0.22:6789/0,rts23=
172.29.0.23:6789/0}, election epoch 12, quorum 0,1,2 rts21,rts22,rts23
     osdmap e25: 3 osds: 0 up, 2 in
      pgmap v445: 192 pgs, 3 pools, 40960 MB data, 10242 objects
            10305 MB used, 641 GB / 651 GB avail

            10242/20484 objects degraded (50.000%)
                 192 stale+active+degraded

How can OSDs be "up" when no OSD daemons are running in the cluster?

MTIA,

dk

Dan Koren
Director of Software
DATERA | 650.210.7910 | @dateranews

dnk@datera.io

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com