Sorry we didn't get back to you directly. You were right about this bug, and it has been resolved in the current next branch (will be Cuttlefish). See http://tracker.ceph.com/issues/4822 and the "Re: cuttlefish countdown -- OSD doesn't get marked out" thread for details. :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Apr 23, 2013 at 5:33 AM, Aurélien Dunand <adunand@xxxxxxxxxxxx> wrote: > Hi, > > I've got a problem with Ceph 0.60, when an OSD fails, it goes down, but > it never goes out, even after the down out interval. This behavior works > in 0.56.4. > > When I stop an OSD, after the mon_osd_down_out_interval seconds, this > OSD is not set out of the cluster: > > ceph osd tree > > # id weight type name up/down reweight > -1 6 root default > -3 6 rack unknownrack > -2 2 host it-test-15.lab > 0 1 osd.0 down 1 > 1 1 osd.1 up 1 > -4 2 host it-test-16.lab > 2 1 osd.2 up 1 > 3 1 osd.3 up 1 > -5 2 host it-test-17.lab > 4 1 osd.4 up 1 > 5 1 osd.5 up 1 > > > If I manually set this OSD out with 'ceph osd out osd.0', the cluster > rebalance the data properly. > This affect Ceph 0.60. With v0.56.4, the OSD is set out after the down > out interval. > > Did I miss something ? A new option in v0.60 ? > > It's always occur on newly created cluster. I did not edit the crushmap > and I've unset noout flag just in case. > > My ceph.conf: > > [global] > auth cluster required = none > auth service required = none > auth client required = none > > [mon] > mon osd down out interval = 60 > > [osd] > osd mkfs type = xfs > osd mkfs options xfs = -f -i size=2048 > osd mount options xfs = inode64,noatime > > [mon.a] > host = it-test-8.lab > mon addr = 192.168.32.200:6789 > > [osd.0] > host = it-test-15.lab > devs = /dev/sda3 > > [osd.1] > host = it-test-15.lab > devs = /dev/sdb1 > > [osd.2] > host = it-test-16.lab > devs = /dev/sda3 > > [osd.3] > host = it-test-16.lab > devs = /dev/sdb1 > > [osd.4] > host = it-test-17.lab > devs = /dev/sda3 > > [osd.5] > host = it-test-17.lab > devs = /dev/sdb1 > > > My running conf about down_out: > > ceph --admin-daemon /var/run/ceph/ceph-mon.a.asok config show | grep _out_ > "mon_osd_adjust_down_out_interval": "true", > "mon_osd_auto_mark_auto_out_in": "true", > "mon_osd_down_out_interval": "60", > "mon_osd_down_out_subtree_limit": "rack", > > > Thanks. > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com