Hi, >> >> # ceph osd tree >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -7 16.89590 root ssd-disks >> -11 0 host ssd1 >> 598798032 0 osd.598798032 DNE 0 > Yikes! Yes... indeed, I don't like this number too... >> 21940 0 osd.21940 DNE 0 >> 71 0 osd.71 DNE 0 >> >> My question is how to delete this osds without direct editing crushmap >> ? It is production system, I can't affort any service interruption :(, >> when I try to ceph osd crush remove then ceph-mon crushes.... >> >> I dumped crushmap, but it took 19G (!!) after decompiling (compiled >> file is very small). So, I cleaned this file with perl (it take very >> long time), and I have now small txt crushmap, which I edited. But is >> there any chance that ceph will still remember somewhere about this >> huge numbers for osds ? Is it safe to apply this cleaned crushmap to >> cluster ? > It sounds like the problem is the OSDMap, not CRUSH per se. Can you > attach the output from 'ceph osd dump -f json-pretty'? It's quite big so I put it on pastebin: http://pastebin.com/Unkk2Pa7 > Do you know how osd.598798032 got created? Or osd.21940 for that matter. > OSD ids should be small since they are stored internally by OSDMap as a > vector. This is probably why your mon is crashing. [root@cc1 /etc/ceph]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -7 16.89590 root ssd-intel-s3700 -11 0 host ssd-stor1 69 0 osd.69 down 0 1.00000 70 0 osd.70 down 0 1.00000 71 0 osd.71 down 0 1.00000 This the moment, when it happend: ]# for i in `seq 69 71`;do ceph osd crush remove osd.$i;done removed item id 69 name 'osd.69' from crush map removed item id 70 name 'osd.70' from crush map here i press ctrl+c 2016-12-28 17:38:10.055239 7f4576d7a700 0 monclient: hunting for new mon 2016-12-28 17:38:10.055582 7f4574233700 0 -- 192.168.128.1:0/1201679761 >> 192.168.128.2:6789/0 pipe(0x7f456c023190 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f456c024470).fault 2016-12-28 17:38:30.550622 7f4574233700 0 -- 192.168.128.1:0/1201679761 >> 192.168.128.1:6789/0 pipe(0x7f45600008c0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4560001df0).fault 2016-12-28 17:38:54.551031 7f4574474700 0 -- 192.168.128.1:0/1201679761 >> 192.168.128.2:6789/0 pipe(0x7f45600046c0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f45600042b0).fault after restart of ceph-mon: ]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -7 16.89590 root ssd-intel-s3700 -11 0 host ssd-stor1 -231707408 0 22100 0 osd.22100 DNE 0 71 0 osd.71 DNE 0 and later: [root@cc1 ~]# ceph osd crush remove osd.22100 device 'osd.22100' does not appear in the crush map [root@cc1 ~]# ceph osd crush remove osd.71 2016-12-28 17:52:34.459668 7f426a862700 0 monclient: hunting for new mon 2016-12-28 17:52:55.238418 7f426a862700 0 monclient: hunting for new mon 2016-12-28 17:52:55.238680 7f4262ebc700 0 -- 192.168.128.1:0/692048545 >> 192.168.128.2:6789/0 pipe(0x7f4254028300 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4254026800).fault and after another restart of ceph-mon: ]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -7 16.89590 root ssd-intel-s3700 -11 0 host ssd-stor1 598798032 0 osd.598798032 DNE 0 21940 0 osd.21940 DNE 0 71 0 osd.71 DNE 0 -- Regards Luk -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html