Hi Igor, I can't access these drives. They have an OSD- or LVM process hanging in D-state. Any attempt to do something with these gets stuck as well. I somehow need to wait for recovery to finish and protect the still running OSDs from crashing similarly badly. After we have full redundancy again and service is back, I can add the setting osd_compact_on_start=true and start rebooting servers. Right now I need to prevent the ship from sinking. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Igor Fedotov <igor.fedotov@xxxxxxxx> Sent: 06 October 2022 13:28:11 To: Frank Schilder; ceph-users@xxxxxxx Subject: Re: OSD crashes during upgrade mimic->octopus IIUC the OSDs that expose "had timed out after 15" are failing to start up. Is that correct or I missed something? I meant trying compaction for them... On 10/6/2022 2:27 PM, Frank Schilder wrote: > Hi Igor, > > thanks for your response. > >> And what's the target Octopus release? > ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable) > > I'm afraid I don't have the luxury right now to take OSDs down or add extra load with an on-line compaction. I would really appreciate a way to make the OSDs more crash tolerant until I have full redundancy again. Is there a setting that increases the OPS timeout or is there a way to restrict the load to tolerable levels? > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Igor Fedotov <igor.fedotov@xxxxxxxx> > Sent: 06 October 2022 13:15 > To: Frank Schilder; ceph-users@xxxxxxx > Subject: Re: OSD crashes during upgrade mimic->octopus > > Hi Frank, > > you might want to compact RocksDB by ceph-kvstore-tool for those OSDs > which are showing > > "heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1886536700' had timed out after 15" > > > > I could see such an error after bulk data removal and following severe > DB performance drop pretty often. > > Thanks, > Igor -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx