Hello, in the process of redeploying some OSDs in our cluster, after destroying one of them (down, out, remove from crushmap) and trying to redeploy it (crush add ,start), we reach a state where the OSD gets stuck at booting state: root@staging-rd0-02:~# ceph daemon osd.12 status { "cluster_fsid": "XXXXXXXXXXX", "osd_fsid": "XXXXXXXXXXXXXX", "whoami": 12, "state": "booting", "oldest_map": 150201, "newest_map": 150779, "num_pgs": 0} No flags that could prevent the OSD to get up is in place. The OSD never gets marked as up in 'ceph osd tree' and never gets in. If I try to manual get it in, it gets out after a while. The cluster OSD map keeps going forward, but the OSD cannot catch-up of course. I started the OSD with debugging options: debug osd = 20 debug filestore = 20 debug journal = 20 debug monc = 20 debug ms = 1 and what I see is contiuning OSD logs of this kind: 2016-06-15 16:39:33.876339 7f0256b61700 10 osd.12 150798 do_waiters -- start 2016-06-15 16:39:33.876343 7f0256b61700 10 osd.12 150798 do_waiters -- finish 2016-06-15 16:39:34.390560 7f022e2ee700 20 osd.12 150798 update_osd_stat osd_stat(59384 kB used, 558 GB avail, 558 GB total, peers []/[] op hist []) 2016-06-15 16:39:34.390622 7f022e2ee700 5 osd.12 150798 heartbeat: osd_stat(59384 kB used, 558 GB avail, 558 GB total, peers []/[] op hist []) 2016-06-15 16:39:34.876526 7f0256b61700 5 osd.12 150798 tick 2016-06-15 16:39:34.876561 7f0256b61700 10 osd.12 150798 do_waiters -- start 2016-06-15 16:39:34.876565 7f0256b61700 10 osd.12 150798 do_waiters -- finish 2016-06-15 16:39:35.876729 7f0256b61700 5 osd.12 150798 tick 2016-06-15 16:39:35.876762 7f0256b61700 10 osd.12 150798 do_waiters -- start 2016-06-15 16:39:35.876766 7f0256b61700 10 osd.12 150798 do_waiters -- finish 2016-06-15 16:39:36.646355 7f025535e700 20 filestore(/rados/staging-rd0-02-12) sync_entry woke after 30.000161 2016-06-15 16:39:36.646421 7f025535e700 20 filestore(/rados/staging-rd0-02-12) sync_entry waiting for max_interval 30.000000 2016-06-15 16:39:36.876917 7f0256b61700 5 osd.12 150798 tick 2016-06-15 16:39:36.876949 7f0256b61700 10 osd.12 150798 do_waiters -- start 2016-06-15 16:39:36.876953 7f0256b61700 10 osd.12 150798 do_waiters -- finish 2016-06-15 16:39:37.877112 7f0256b61700 5 osd.12 150798 tick 2016-06-15 16:39:37.877142 7f0256b61700 10 osd.12 150798 do_waiters -- start 2016-06-15 16:39:37.877147 7f0256b61700 10 osd.12 150798 do_waiters -- finish 2016-06-15 16:39:38.877298 7f0256b61700 5 osd.12 150798 tick 2016-06-15 16:39:38.877327 7f0256b61700 10 osd.12 150798 do_waiters -- start 2016-06-15 16:39:38.877331 7f0256b61700 10 osd.12 150798 do_waiters -- finish Is there a solution for this problem? Known bug? We are on firefly (0.80.11) and wanted to do some maintenance before going to hammer, but now we are somewhat stuck. Best regards, Kostis _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com