Re: Bug in OSD Maps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I know this thread has been silent for a while, however due to various reasons, I have been forced to work specifically on this issue this weekend.

As it turns out, you were partly right, the fix for the state is to use ceph-objectstore, however it was not to remove the PG in question, rather to inject the missing OSD Map Epoch. Once it has the required Epoch, it can successfully start the OSD in question and resume its download of OSDmaps through the normal mechanism.

As an example, osd id 123 on storage1 with missing epoch 9876:

On A monitor:
  ceph osd getmap 9876 > e9876

SCP (or other mechanism) the file e9876 from monitor to storage1

Then forcibly inject the epoch into the not-running OSD (our system is configured with cluster name txc1, as a result your mileage may vary).

  sudo ceph-objectstore-tool --cluster=txc1 --data-path /var/lib/ceph/osd/txc1-123 --journal-path /var/lib/ceph/osd/txc1-123/journal --op set-osdmap --file /path/to/e9876 --epoch 9876 --force

I wanted to share this nugget of information for posterity, as I can not be the only person out there who has run across this and there appears to be limited documentation on this (and what documentation of ceph-objectstore-tool there is, is slightly inconsistent with the realities of its use). Thanks also to Wido for the poke in the right direction elsewhere, as he filled in the missing bits.

Regards,

Stuart 
spacer


  Stuart Harland: 
Infrastructure Engineer
Email: s.harland@xxxxxxxxxxxxxxxxxxxxxx
Tel: +44 (0) 207 183 1411




LiveLink Technology Ltd
McCormack House
56A East Street
Havant
PO9 1BS


IMPORTANT: The information transmitted in this e-mail is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged information. If you are not the intended recipient of this message, please do not read, copy, use or disclose this communication and notify the sender immediately. Any review, retransmission, dissemination or other use of, or taking any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. Any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of LiveLink. This e-mail message has been checked for the presence of computer viruses. However, LiveLink is not able to accept liability for any damage caused by this e-mail.



On 26 May 2017, at 22:53, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

Yeah, not sure. It might just be that the restarting is newly exposing old issues, but I don't see how. I gather from skimming that ticket that it was a disk state bug earlier on that was going undetected until Jewel, which is why I was wondering about the upgrades.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux