Re: osd removal problem

Sean Redmond <sean.redmond1@xxxxxxxxx> · Thu, 29 Dec 2016 13:06:06 +0000

Hi,
Hmm, could you try and dump the crush map - decompile it - modify it to remove the DNE osd's, compile it and load it back into ceph?

http://docs.ceph.com/docs/master/rados/operations/crush-map/#get-a-crush-map

Thanks

On Thu, Dec 29, 2016 at 1:01 PM, Łukasz Chrustek <skidoo@xxxxxxx> wrote:
Hi,

]# ceph osd tree

ID        WEIGHT    TYPE NAME             UP/DOWN REWEIGHT PRIMARY-AFFINITY

       -7  16.89590 root ssd-disks

      -11         0     host ssd1

598798032         0         osd.598798032     DNE        0

    21940         0         osd.21940         DNE        0

       71         0         osd.71            DNE        0

]# ceph osd rm osd.598798032

Error EINVAL: osd id 598798032 is too largeinvalid osd id-34

]# ceph osd rm osd.21940

osd.21940 does not exist.

]# ceph osd rm osd.71

osd.71 does not exist.

> ceph osd rm osd.$ID

> On Thu, Dec 29, 2016 at 10:44 AM, Łukasz Chrustek <skidoo@xxxxxxx> wrote:

> Hi,

>  I was trying to delete 3 osds from cluster, deletion procces took very

>  long  time and I interrupted it. mon process then crushed, and in ceph

>  osd tree (after restart ceph-mon) I saw:

>   ~]# ceph osd tree

>  ID         WEIGHT    TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY

>          -7  16.89590 root ssd-disks

>         -11         0     host ssd1

>  -231707408         0

>       22100         0         osd.22100        DNE        0

>          71         0         osd.71           DNE        0

>  when I tried to delete osd.22100:

>  [root@cc1 ~]# ceph osd crush remove osd.22100

>  device 'osd.22100' does not appear in the crush map

>  then I tried to delete osd.71 and mon proccess crushed:

>  [root@cc1 ~]# ceph osd crush remove osd.71

>  2016-12-28 17:52:34.459668 7f426a862700  0 monclient: hunting for new mon

>  after restart of ceph-mon in ceph osd tree it shows:

>  # ceph osd tree

>  ID        WEIGHT    TYPE NAME             UP/DOWN REWEIGHT PRIMARY-AFFINITY

>         -7  16.89590 root ssd-disks

>        -11         0     host ssd1

>  598798032         0         osd.598798032     DNE        0

>      21940         0         osd.21940         DNE        0

>         71         0         osd.71            DNE        0

>  My question is how to delete this osds without direct editing crushmap

>  ? It is production system, I can't affort any service interruption :(,

>  when I try to ceph osd crush remove then ceph-mon crushes....

>  I  dumped  crushmap,  but it took 19G (!!) after decompiling (compiled

>  file  is  very small). So, I cleaned this file with perl (it take very

>  long  time), and I have now small txt crushmap, which I edited. But is

>  there  any  chance  that ceph will still remember somewhere about this

>  huge  numbers  for osds ? Is it safe to apply this cleaned crushmap to

>  cluster ? Cluster now works OK, but there is over 23TB production data

>  which I can't loose. Please advice what to do.

>  --

>  Regards

>  Luk

>  _______________________________________________

>  ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Pozdrowienia,

 Łukasz Chrustek

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com