Hmm. The monitor code for checking this all looks good to me. Can you go to one of your monitor nodes and dump the config? (http://ceph.com/docs/master/rados/configuration/ceph-conf/?highlight=admin%20socket#viewing-a-configuration-at-runtime) -Greg On Thu, Mar 28, 2013 at 12:33 PM, Martin Mailand <martin@xxxxxxxxxxxx> wrote: > Hi, > > I get the same behavior an new created cluster as well, no changes to > the cluster config at all. > I stop the osd.1, after 20 seconds it got marked down. But it never get > marked out. > > ceph version 0.59 (cbae6a435c62899f857775f66659de052fb0e759) > > -martin > > On 28.03.2013 19:48, John Wilkins wrote: >> Martin, >> >> Greg is talking about noout. With Ceph, you can specifically preclude >> OSDs from being marked out when down to prevent rebalancing--e.g., >> during upgrades, short-term maintenance, etc. >> >> http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#stopping-w-out-rebalancing >> >> On Thu, Mar 28, 2013 at 11:12 AM, Martin Mailand <martin@xxxxxxxxxxxx> wrote: >>> Hi Greg, >>> >>> setting the osd manually out triggered the recovery. >>> But now it is the question, why is the osd not marked out after 300 >>> seconds? That's a default cluster, I use the 0.59 build from your site. >>> And I didn't change any value, except for the crushmap. >>> >>> That's my ceph.conf. >>> >>> -martin >>> >>> [global] >>> auth cluster requierd = none >>> auth service required = none >>> auth client required = none >>> # log file = "" >>> log_max_recent=100 >>> log_max_new=100 >>> >>> [mon] >>> mon data = /data/mon.$id >>> [mon.a] >>> host = store1 >>> mon addr = 192.168.195.31:6789 >>> [mon.b] >>> host = store3 >>> mon addr = 192.168.195.33:6789 >>> [mon.c] >>> host = store5 >>> mon addr = 192.168.195.35:6789 >>> [osd] >>> journal aio = true >>> osd data = /data/osd.$id >>> osd mount options btrfs = rw,noatime,nodiratime,autodefrag >>> osd mkfs options btrfs = -n 32k -l 32k >>> >>> [osd.0] >>> host = store1 >>> osd journal = /dev/sdg1 >>> btrfs devs = /dev/sdc >>> [osd.1] >>> host = store1 >>> osd journal = /dev/sdh1 >>> btrfs devs = /dev/sdd >>> [osd.2] >>> host = store1 >>> osd journal = /dev/sdi1 >>> btrfs devs = /dev/sde >>> [osd.3] >>> host = store1 >>> osd journal = /dev/sdj1 >>> btrfs devs = /dev/sdf >>> [osd.4] >>> host = store2 >>> osd journal = /dev/sdg1 >>> btrfs devs = /dev/sdc >>> [osd.5] >>> host = store2 >>> osd journal = /dev/sdh1 >>> btrfs devs = /dev/sdd >>> [osd.6] >>> host = store2 >>> osd journal = /dev/sdi1 >>> btrfs devs = /dev/sde >>> [osd.7] >>> host = store2 >>> osd journal = /dev/sdj1 >>> btrfs devs = /dev/sdf >>> [osd.8] >>> host = store3 >>> osd journal = /dev/sdg1 >>> btrfs devs = /dev/sdc >>> [osd.9] >>> host = store3 >>> osd journal = /dev/sdh1 >>> btrfs devs = /dev/sdd >>> [osd.10] >>> host = store3 >>> osd journal = /dev/sdi1 >>> btrfs devs = /dev/sde >>> [osd.11] >>> host = store3 >>> osd journal = /dev/sdj1 >>> btrfs devs = /dev/sdf >>> [osd.12] >>> host = store4 >>> osd journal = /dev/sdg1 >>> btrfs devs = /dev/sdc >>> [osd.13] >>> host = store4 >>> osd journal = /dev/sdh1 >>> btrfs devs = /dev/sdd >>> [osd.14] >>> host = store4 >>> osd journal = /dev/sdi1 >>> btrfs devs = /dev/sde >>> [osd.15] >>> host = store4 >>> osd journal = /dev/sdj1 >>> btrfs devs = /dev/sdf >>> [osd.16] >>> host = store5 >>> osd journal = /dev/sdg1 >>> btrfs devs = /dev/sdc >>> [osd.17] >>> host = store5 >>> osd journal = /dev/sdh1 >>> btrfs devs = /dev/sdd >>> [osd.18] >>> host = store5 >>> osd journal = /dev/sdi1 >>> btrfs devs = /dev/sde >>> [osd.19] >>> host = store5 >>> osd journal = /dev/sdj1 >>> btrfs devs = /dev/sdf >>> [osd.20] >>> host = store6 >>> osd journal = /dev/sdg1 >>> btrfs devs = /dev/sdc >>> [osd.21] >>> host = store6 >>> osd journal = /dev/sdh1 >>> btrfs devs = /dev/sdd >>> [osd.22] >>> host = store6 >>> osd journal = /dev/sdi1 >>> btrfs devs = /dev/sde >>> [osd.23] >>> host = store6 >>> osd journal = /dev/sdj1 >>> btrfs devs = /dev/sdf >>> >>> >>> On 28.03.2013 19:01, Gregory Farnum wrote: >>>> Your crush map looks fine to me. I'm saying that your ceph -s output >>>> showed the OSD still hadn't been marked out. No data will be migrated >>>> until it's marked out. >>>> After ten minutes it should have been marked out, but that's based on >>>> a number of factors you have some control over. If you just want a >>>> quick check of your crush map you can mark it out manually, too. >>>> -Greg >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com