Down OSD never goes out with v0.60

Aurélien Dunand <adunand@xxxxxxxxxxxx> · Tue, 23 Apr 2013 14:33:02 +0200

Hi,

I've got a problem with Ceph 0.60, when an OSD fails, it goes down, but
it never goes out, even after the down out interval. This behavior works
in 0.56.4.

When I stop an OSD, after the mon_osd_down_out_interval seconds, this
OSD is not set out of the cluster:

ceph osd tree

# id    weight  type name       up/down reweight
-1      6       root default
-3      6               rack unknownrack
-2      2                   host it-test-15.lab
0       1                       osd.0   down        1
1       1                       osd.1   up          1
-4      2                   host it-test-16.lab
2       1                       osd.2   up          1
3       1                       osd.3   up          1
-5      2                   host it-test-17.lab
4       1                       osd.4   up          1
5       1                       osd.5   up          1

If I manually set this OSD out with 'ceph osd out osd.0', the cluster
rebalance the data properly.
This affect Ceph 0.60. With v0.56.4, the OSD is set out after the down
out interval.

Did I miss something ? A new option in v0.60 ?

It's always occur on newly created cluster. I did not edit the crushmap
and I've unset noout flag just in case.

My ceph.conf:

[global]
    auth cluster required = none
    auth service required = none
    auth client required = none

[mon]
    mon osd down out interval = 60

[osd]
    osd mkfs type = xfs
    osd mkfs options xfs = -f -i size=2048
    osd mount options xfs = inode64,noatime

[mon.a]
    host = it-test-8.lab
    mon addr = 192.168.32.200:6789

[osd.0]
    host = it-test-15.lab
    devs = /dev/sda3

[osd.1]
    host = it-test-15.lab
    devs = /dev/sdb1

[osd.2]
    host = it-test-16.lab
    devs = /dev/sda3

[osd.3]
    host = it-test-16.lab
    devs = /dev/sdb1

[osd.4]
    host = it-test-17.lab
    devs = /dev/sda3

[osd.5]
    host = it-test-17.lab
    devs = /dev/sdb1

My running conf about down_out:

ceph --admin-daemon /var/run/ceph/ceph-mon.a.asok config show | grep _out_
  "mon_osd_adjust_down_out_interval": "true",
  "mon_osd_auto_mark_auto_out_in": "true",
  "mon_osd_down_out_interval": "60",
  "mon_osd_down_out_subtree_limit": "rack",

Thanks.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com