On Thu, 29 Dec 2016, Łukasz Chrustek wrote: > Hi, > > I was trying to delete 3 osds from cluster, deletion procces took very > long time and I interrupted it. mon process then crushed, and in ceph > osd tree (after restart ceph-mon) I saw: > > ~]# ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -7 16.89590 root ssd-disks > -11 0 host ssd1 > -231707408 0 > 22100 0 osd.22100 DNE 0 > 71 0 osd.71 DNE 0 > > > when I tried to delete osd.22100: > > [root@cc1 ~]# ceph osd crush remove osd.22100 > device 'osd.22100' does not appear in the crush map > > then I tried to delete osd.71 and mon proccess crushed: > > [root@cc1 ~]# ceph osd crush remove osd.71 > 2016-12-28 17:52:34.459668 7f426a862700 0 monclient: hunting for new mon > > after restart of ceph-mon in ceph osd tree it shows: > > # ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -7 16.89590 root ssd-disks > -11 0 host ssd1 > 598798032 0 osd.598798032 DNE 0 Yikes! > 21940 0 osd.21940 DNE 0 > 71 0 osd.71 DNE 0 > > My question is how to delete this osds without direct editing crushmap > ? It is production system, I can't affort any service interruption :(, > when I try to ceph osd crush remove then ceph-mon crushes.... > > I dumped crushmap, but it took 19G (!!) after decompiling (compiled > file is very small). So, I cleaned this file with perl (it take very > long time), and I have now small txt crushmap, which I edited. But is > there any chance that ceph will still remember somewhere about this > huge numbers for osds ? Is it safe to apply this cleaned crushmap to > cluster ? It sounds like the problem is the OSDMap, not CRUSH per se. Can you attach the output from 'ceph osd dump -f json-pretty'? Do you know how osd.598798032 got created? Or osd.21940 for that matter. OSD ids should be small since they are stored internally by OSDMap as a vector. This is probably why your mon is crashing. sage