I know this thread has been silent for a while, however due to various reasons, I have been forced to work specifically on this issue this weekend. As it turns out, you were partly right, the fix for the state is to use ceph-objectstore, however it was not to remove the PG in question, rather to inject the missing OSD Map Epoch. Once it has the required Epoch, it can successfully start the OSD in question and resume its download of OSDmaps through the normal mechanism. As an example, osd id 123 on storage1 with missing epoch 9876: On A monitor: ceph osd getmap 9876 > e9876 SCP (or other mechanism) the file e9876 from monitor to storage1 Then forcibly inject the epoch into the not-running OSD (our system is configured with cluster name txc1, as a result your mileage may vary). sudo ceph-objectstore-tool --cluster=txc1 --data-path /var/lib/ceph/osd/txc1-123 --journal-path /var/lib/ceph/osd/txc1-123/journal --op set-osdmap --file /path/to/e9876 --epoch 9876 --force I wanted to share this nugget of information for posterity, as I can not be the only person out there who has run across this and there appears to be limited documentation on this (and what documentation of ceph-objectstore-tool there is, is slightly inconsistent with the realities of its use). Thanks also to Wido for the poke in the right direction elsewhere, as he filled in the missing bits. Regards, Stuart
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com