Re: Down OSD never goes out with v0.60

Gregory Farnum <greg@xxxxxxxxxxx> · Fri, 26 Apr 2013 12:39:18 -0700



Sorry we didn't get back to you directly. You were right about this
bug, and it has been resolved in the current next branch (will be
Cuttlefish). See http://tracker.ceph.com/issues/4822 and the "Re:
 cuttlefish countdown -- OSD doesn't get marked out"
thread for details. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Apr 23, 2013 at 5:33 AM, Aurélien Dunand <adunand@xxxxxxxxxxxx> wrote:
> Hi,
>
> I've got a problem with Ceph 0.60, when an OSD fails, it goes down, but
> it never goes out, even after the down out interval. This behavior works
> in 0.56.4.
>
> When I stop an OSD, after the mon_osd_down_out_interval seconds, this
> OSD is not set out of the cluster:
>
> ceph osd tree
>
> # id    weight  type name       up/down reweight
> -1      6       root default
> -3      6               rack unknownrack
> -2      2                   host it-test-15.lab
> 0       1                       osd.0   down        1
> 1       1                       osd.1   up          1
> -4      2                   host it-test-16.lab
> 2       1                       osd.2   up          1
> 3       1                       osd.3   up          1
> -5      2                   host it-test-17.lab
> 4       1                       osd.4   up          1
> 5       1                       osd.5   up          1
>
>
> If I manually set this OSD out with 'ceph osd out osd.0', the cluster
> rebalance the data properly.
> This affect Ceph 0.60. With v0.56.4, the OSD is set out after the down
> out interval.
>
> Did I miss something ? A new option in v0.60 ?
>
> It's always occur on newly created cluster. I did not edit the crushmap
> and I've unset noout flag just in case.
>
> My ceph.conf:
>
> [global]
>     auth cluster required = none
>     auth service required = none
>     auth client required = none
>
> [mon]
>     mon osd down out interval = 60
>
> [osd]
>     osd mkfs type = xfs
>     osd mkfs options xfs = -f -i size=2048
>     osd mount options xfs = inode64,noatime
>
> [mon.a]
>     host = it-test-8.lab
>     mon addr = 192.168.32.200:6789
>
> [osd.0]
>     host = it-test-15.lab
>     devs = /dev/sda3
>
> [osd.1]
>     host = it-test-15.lab
>     devs = /dev/sdb1
>
> [osd.2]
>     host = it-test-16.lab
>     devs = /dev/sda3
>
> [osd.3]
>     host = it-test-16.lab
>     devs = /dev/sdb1
>
> [osd.4]
>     host = it-test-17.lab
>     devs = /dev/sda3
>
> [osd.5]
>     host = it-test-17.lab
>     devs = /dev/sdb1
>
>
> My running conf about down_out:
>
> ceph --admin-daemon /var/run/ceph/ceph-mon.a.asok config show | grep _out_
>   "mon_osd_adjust_down_out_interval": "true",
>   "mon_osd_auto_mark_auto_out_in": "true",
>   "mon_osd_down_out_interval": "60",
>   "mon_osd_down_out_subtree_limit": "rack",
>
>
> Thanks.
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com