Olá Bruno I am not understanding your outputs. On the first 'ceph -s' it says one mon is down but hour 'ceph health detail' does not report it further. On your crush map I count 7 osds= 0,1,2,3,4,6,7 but ceph -s says only 6 are active. Can you send the output of 'ceph osd tree, 'ceph osd df' and 'ceph osd dump'? Abraco Goncalo ________________________________________ From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Bruno Silva [bemanuel.pe@xxxxxxxxx] Sent: 19 November 2016 11:48 To: ceph-users@xxxxxxxxxxxxxx Subject: Re: Ceph Down on Cluster Hi, thanks. # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable straw_calc_version 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 device5 device 6 osd.6 device 7 osd.7 # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host pxm00node01 { id -2 # do not change unnecessarily # weight 0.540 alg straw hash 0 # rjenkins1 item osd.0 weight 0.540 } host pmx00node03 { id -3 # do not change unnecessarily # weight 0.540 alg straw hash 0 # rjenkins1 item osd.1 weight 0.540 } host pxmnode04 { id -4 # do not change unnecessarily # weight 0.000 alg straw hash 0 # rjenkins1 } host pmx00node04 { id -5 # do not change unnecessarily # weight 0.530 alg straw hash 0 # rjenkins1 item osd.2 weight 0.530 } host pmx00node01 { id -6 # do not change unnecessarily # weight 1.080 alg straw hash 0 # rjenkins1 item osd.6 weight 0.540 item osd.7 weight 0.540 } host pmx00node02 { id -7 # do not change unnecessarily # weight 0.530 alg straw hash 0 # rjenkins1 item osd.3 weight 0.530 } host pmx00node05 { id -8 # do not change unnecessarily # weight 0.530 alg straw hash 0 # rjenkins1 item osd.4 weight 0.530 } root default { id -1 # do not change unnecessarily # weight 3.750 alg straw hash 0 # rjenkins1 item pxm00node01 weight 0.540 item pmx00node03 weight 0.540 item pxmnode04 weight 0.000 item pmx00node04 weight 0.530 item pmx00node01 weight 1.080 item pmx00node02 weight 0.530 item pmx00node05 weight 0.530 } # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map Em sex, 18 de nov de 2016 20:48, Brian :: <bc@xxxxxxxx<mailto:bc@xxxxxxxx>> escreveu: Hi Bruno Do you only have 6 OSDs across the 5 nodes? you may have an issue with read or write errors on 1 osd and because there aren't many more go to osds this is going to cause the cluster pain. Post your crush map and the experts here maybe able to advise but with a cluster of this size you may have issues getting it back to a healthy state if 1 osd is causing problems... .. On Fri, Nov 18, 2016 at 10:51 PM, Bruno Silva <bemanuel.pe@xxxxxxxxx<mailto:bemanuel.pe@xxxxxxxxx>> wrote: > I have a Cluster with 5 nodes Ceph. For some reason the sync down and now I > don't know what i can do to restore it. > # ceph -s > cluster 338bc0a5-c2f7-4c0a-9b35-25c7afee50c6 > health HEALTH_WARN > 1 pgs down > 6 pgs incomplete > 6 pgs stuck inactive > 6 pgs stuck unclean > 3 requests are blocked > 32 sec > 1 mons down, quorum 0,1,2,3 0,2,1,3 > monmap e5: 5 mons at > {0=xyxyxyxyx:6789/0,1=xyxyxyxyx:6789/0,2=xyxyxyxyx:6789/0,3=1xyxyxyxyx:6789/0,4=xyxyxyxyx:6789/0} > election epoch 63162, quorum 0,1,2,3 0,2,1,3 > osdmap e2575: 6 osds: 6 up, 6 in > pgmap v6105104: 128 pgs, 1 pools, 748 GB data, 188 kobjects > 2217 GB used, 1072 GB / 3290 GB avail > 122 active+clean > 5 incomplete > 1 down+incomplete > client io 106 B/s wr, 0 op/s > > > ceph -w > cluster 338bc0a5-c2f7-4c0a-9b35-25c7afee50c6 > health HEALTH_WARN > 1 pgs down > 6 pgs incomplete > 6 pgs stuck inactive > 6 pgs stuck unclean > 3 requests are blocked > 32 sec > monmap e5: 5 mons at > {0=xyxyxyxyx:6789/0,1=xyxyxyxyx:6789/0,2=xyxyxyxyx:6789/0,3=xyxyxyxyx:6789/0,4=xyxyxyxyx:6789/0} > election epoch 63164, quorum 0,1,2,3,4 0,2,1,3,4 > osdmap e2575: 6 osds: 6 up, 6 in > pgmap v6105130: 128 pgs, 1 pools, 748 GB data, 188 kobjects > 2217 GB used, 1072 GB / 3290 GB avail > 122 active+clean > 5 incomplete > 1 down+incomplete > client io 1262 B/s wr, 0 op/s > > 2016-11-18 19:49:58.005806 mon.0 [INF] pgmap v6105130: 128 pgs: 1 > down+incomplete, 122 active+clean, 5 incomplete; 748 GB data, 2217 GB used, > 1072 GB / 3290 GB avail; 1262 B/s wr, 0 op/s > 2016-11-18 19:50:02.731566 mon.0 [INF] pgmap v6105131: 128 pgs: 1 > down+incomplete, 122 active+clean, 5 incomplete; 748 GB data, 2217 GB used, > 1072 GB / 3290 GB avail; 1228 B/s wr, 0 op/s > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com