Hi Stéphane, I think I got it. I purged my complete Cluster and set up the new one like the old and got exactly the same problem again. Then I did "ceph osd crush tunables optimal" which added the option "chooseleaf_vary_r 1" to the crushmap. After that everything works fine. Try it at your cluster. Greetings Yves Gesendet: Dienstag, 24. Februar 2015 um 10:49 Uhr Von: "Stéphane DUGRAVOT" <stephane.dugravot@xxxxxxxxxxxxxxxx> An: "Yves Kretzschmar" <YvesKretzschmar@xxxxxx>, ceph-users@xxxxxxxxxxxxxx Betreff: Re: Cluster never reaching clean after osd out ------------------------------------------------------------ I have a Cluster of 3 hosts, running Debian wheezy and Backports Kernel 3.16.0-0.bpo.4-amd64. For testing I did a ~# ceph osd out 20 from a clean state. Ceph starts rebalancing, watching ceph -w one sees changing pgs stuck unclean to get up and then go down to about 11. Short after that the cluster keeps stuck forever in this state: health HEALTH_WARN 68 pgs stuck unclean; recovery 450/169647 objects degraded (0.265%); 3691/169647 objects misplaced (2.176%) According to the documentation at http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ the Cluster should reach a clean state after an osd out. What am I doing wrong? Hi Yves and Cephers, I have a cluster with 6 nodes and 36 OSD. I have the same pb : cluster 1d0503fb-36d0-4dbc-aabe-a2a0709163cd health HEALTH_WARN 76 pgs stuck unclean; recovery 1/624 objects degraded (0.160%); 7/624 objects misplaced (1.122%) monmap e6: 6 mons osdmap e616: 36 osds: 36 up, 35 in pgmap v16344: 2048 pgs, 1 pools, 689 MB data, 208 objects 178 GB used, 127 TB / 127 TB avail 1/624 objects degraded (0.160%); 7/624 objects misplaced (1.122%) 76 active+remapped 1972 active+clean After 'out' osd.15, ceph didn't return to health ok, and get misplaced object ... :-/ I noticed that this happen when i use a replicated 3 pool. When the pool use a replicated 2, ceph returned to health ok... Have you try with a replicated 2 pool ? In the same way, I wonder why he does not return to the status ok CEPH OSD TREE # id weight type name up/down reweight -1000 144 root default -200 48 datacenter mo -133 48 rack mom02 -4 24 host mom02h01 12 4 osd.12 up 1 13 4 osd.13 up 1 14 4 osd.14 up 1 16 4 osd.16 up 1 17 4 osd.17 up 1 15 4 osd.15 up 0 -5 24 host mom02h02 18 4 osd.18 up 1 19 4 osd.19 up 1 20 4 osd.20 up 1 21 4 osd.21 up 1 22 4 osd.22 up 1 23 4 osd.23 up 1 -202 48 datacenter me -135 48 rack mem04 -6 24 host mem04h01 24 4 osd.24 up 1 25 4 osd.25 up 1 26 4 osd.26 up 1 27 4 osd.27 up 1 28 4 osd.28 up 1 29 4 osd.29 up 1 -7 24 host mem04h02 30 4 osd.30 up 1 31 4 osd.31 up 1 32 4 osd.32 up 1 33 4 osd.33 up 1 34 4 osd.34 up 1 35 4 osd.35 up 1 -201 48 datacenter li -134 48 rack lis04 -2 24 host lis04h01 0 4 osd.0 up 1 2 4 osd.2 up 1 3 4 osd.3 up 1 4 4 osd.4 up 1 5 4 osd.5 up 1 1 4 osd.1 up 1 -3 24 host lis04h02 6 4 osd.6 up 1 7 4 osd.7 up 1 8 4 osd.8 up 1 9 4 osd.9 up 1 10 4 osd.10 up 1 11 4 osd.11 up 1 Crushmap # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 device 8 osd.8 device 9 osd.9 device 10 osd.10 device 11 osd.11 device 12 osd.12 device 13 osd.13 device 14 osd.14 device 15 osd.15 device 16 osd.16 device 17 osd.17 device 18 osd.18 device 19 osd.19 device 20 osd.20 device 21 osd.21 device 22 osd.22 device 23 osd.23 device 24 osd.24 device 25 osd.25 device 26 osd.26 device 27 osd.27 device 28 osd.28 device 29 osd.29 device 30 osd.30 device 31 osd.31 device 32 osd.32 device 33 osd.33 device 34 osd.34 device 35 osd.35 # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host lis04h01 { id -2 # do not change unnecessarily # weight 24.000 alg straw hash 0 # rjenkins1 item osd.0 weight 4.000 item osd.2 weight 4.000 item osd.3 weight 4.000 item osd.4 weight 4.000 item osd.5 weight 4.000 item osd.1 weight 4.000 } host lis04h02 { id -3 # do not change unnecessarily # weight 24.000 alg straw hash 0 # rjenkins1 item osd.6 weight 4.000 item osd.7 weight 4.000 item osd.8 weight 4.000 item osd.9 weight 4.000 item osd.10 weight 4.000 item osd.11 weight 4.000 } host mom02h01 { id -4 # do not change unnecessarily # weight 24.000 alg straw hash 0 # rjenkins1 item osd.12 weight 4.000 item osd.13 weight 4.000 item osd.14 weight 4.000 item osd.16 weight 4.000 item osd.17 weight 4.000 item osd.15 weight 4.000 } host mom02h02 { id -5 # do not change unnecessarily # weight 24.000 alg straw hash 0 # rjenkins1 item osd.18 weight 4.000 item osd.19 weight 4.000 item osd.20 weight 4.000 item osd.21 weight 4.000 item osd.22 weight 4.000 item osd.23 weight 4.000 } host mem04h01 { id -6 # do not change unnecessarily # weight 24.000 alg straw hash 0 # rjenkins1 item osd.24 weight 4.000 item osd.25 weight 4.000 item osd.26 weight 4.000 item osd.27 weight 4.000 item osd.28 weight 4.000 item osd.29 weight 4.000 } host mem04h02 { id -7 # do not change unnecessarily # weight 24.000 alg straw hash 0 # rjenkins1 item osd.30 weight 4.000 item osd.31 weight 4.000 item osd.32 weight 4.000 item osd.33 weight 4.000 item osd.34 weight 4.000 item osd.35 weight 4.000 } rack mom02 { id -133 # do not change unnecessarily # weight 48.000 alg straw hash 0 # rjenkins1 item mom02h01 weight 24.000 item mom02h02 weight 24.000 } rack lis04 { id -134 # do not change unnecessarily # weight 48.000 alg straw hash 0 # rjenkins1 item lis04h01 weight 24.000 item lis04h02 weight 24.000 } rack mem04 { id -135 # do not change unnecessarily # weight 48.000 alg straw hash 0 # rjenkins1 item mem04h01 weight 24.000 item mem04h02 weight 24.000 } datacenter mo { id -200 # do not change unnecessarily # weight 48.000 alg straw hash 0 # rjenkins1 item mom02 weight 48.000 } datacenter li { id -201 # do not change unnecessarily # weight 48.000 alg straw hash 0 # rjenkins1 item lis04 weight 48.000 } datacenter me { id -202 # do not change unnecessarily # weight 48.000 alg straw hash 0 # rjenkins1 item mem04 weight 48.000 } root default { id -1000 # do not change unnecessarily # weight 144.000 alg straw hash 0 # rjenkins1 item mo weight 48.000 item me weight 48.000 item li weight 48.000 } # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type datacenter step emit } # end crush map Below some config and command outputs: ~# ceph osd tree # id weight type name up/down reweight -1 76.02 root default -2 25.34 host ve51 0 3.62 osd.0 up 1 3 3.62 osd.3 up 1 6 3.62 osd.6 up 1 9 3.62 osd.9 up 1 12 3.62 osd.12 up 1 15 3.62 osd.15 up 1 18 3.62 osd.18 up 1 -3 25.34 host ve52 1 3.62 osd.1 up 1 4 3.62 osd.4 up 1 7 3.62 osd.7 up 1 10 3.62 osd.10 up 1 13 3.62 osd.13 up 1 16 3.62 osd.16 up 1 19 3.62 osd.19 up 1 -4 25.34 host ve53 2 3.62 osd.2 up 1 5 3.62 osd.5 up 1 8 3.62 osd.8 up 1 11 3.62 osd.11 up 1 14 3.62 osd.14 up 1 17 3.62 osd.17 up 1 20 3.62 osd.20 up 1 ========================== ~# cat ceph.conf [global] fsid = 80ebba06-34f5-49fc-8178-d6cc1d1c1196 public_network = 192.168.10.0/24 cluster_network = 192.168.10.0/24 mon_initial_members = ve51, ve52, ve53 mon_host = 192.168.10.51,192.168.10.52,192.168.10.53 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true mon_osd_down_out_subtree_limit = host osd_pool_default_size=3 osd_pool_default_min_size=2 [osd] osd_journal_size = 20000 osd_mount_options_xfs = noatime,nodiratime,logbsize=256k,logbufs=8,inode64 ========================== ~# ceph -s cluster 80ebba06-34f5-49fc-8178-d6cc1d1c1196 health HEALTH_OK monmap e1: 3 mons at {ve51=192.168.10.51:6789/0,ve52=192.168.10.52:6789/0,ve53=192.168.10.53:6789/0}, election epoch 28, quorum 0,1,2 ve51,ve52,ve53 osdmap e1353: 21 osds: 21 up, 21 in pgmap v16484: 2048 pgs, 2 pools, 219 GB data, 56549 objects 658 GB used, 77139 GB / 77797 GB avail 2048 active+clean ========================== ~# cat crushmap # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 device 8 osd.8 device 9 osd.9 device 10 osd.10 device 11 osd.11 device 12 osd.12 device 13 osd.13 device 14 osd.14 device 15 osd.15 device 16 osd.16 device 17 osd.17 device 18 osd.18 device 19 osd.19 device 20 osd.20 # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host ve51 { id -2 # do not change unnecessarily # weight 25.340 alg straw hash 0 # rjenkins1 item osd.0 weight 3.620 item osd.3 weight 3.620 item osd.6 weight 3.620 item osd.9 weight 3.620 item osd.12 weight 3.620 item osd.15 weight 3.620 item osd.18 weight 3.620 } host ve52 { id -3 # do not change unnecessarily # weight 25.340 alg straw hash 0 # rjenkins1 item osd.1 weight 3.620 item osd.4 weight 3.620 item osd.7 weight 3.620 item osd.10 weight 3.620 item osd.13 weight 3.620 item osd.16 weight 3.620 item osd.19 weight 3.620 } host ve53 { id -4 # do not change unnecessarily # weight 25.340 alg straw hash 0 # rjenkins1 item osd.2 weight 3.620 item osd.5 weight 3.620 item osd.8 weight 3.620 item osd.11 weight 3.620 item osd.14 weight 3.620 item osd.17 weight 3.620 item osd.20 weight 3.620 } root default { id -1 # do not change unnecessarily # weight 76.020 alg straw hash 0 # rjenkins1 item ve51 weight 25.340 item ve52 weight 25.340 item ve53 weight 25.340 } # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com