Hi Igor, I've ran this oneliner: for i in {0..12}; do export CEPH_ARGS="--log-file osd."${i}".log --debug-bluestore 5/20" ; ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.${i} --command fsck ; done; On osd.0 it crashed very quickly, on osd.1 it is still working. I've send those logs in one e-mail. But! I've tried to list disk devices in monitor view, and I've got very interesting screenshot - some part I've emphasized by red rectangulars. I've got a json from syslog, which was as a part cephadm call, where it seems to be correct (for my eyes). Can be this coincidence for this problem? Sincerely Jan Marek Dne Čt, led 04, 2024 at 12:32:47 CET napsal(a) Igor Fedotov: > Hi Jan, > > may I see the fsck logs from all the failing OSDs to see the pattern. IIUC > the full node is suffering from the issue, right? > > > Thanks, > > Igor > > On 1/2/2024 10:53 AM, Jan Marek wrote: > > Hello once again, > > > > I've tried this: > > > > export CEPH_ARGS="--log-file /tmp/osd.0.log --debug-bluestore 5/20" > > ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.0 --command fsck > > > > And I've sending /tmp/osd.0.log file attached. > > > > Sincerely > > Jan Marek > > > > Dne Ne, pro 31, 2023 at 12:38:13 CET napsal(a) Igor Fedotov: > > > Hi Jan, > > > > > > this doesn't look like RocksDB corruption but rather like some BlueStore > > > metadata inconsistency. Also assertion backtrace in the new log looks > > > completely different from the original one. So in an attempt to find any > > > systematic pattern I'd suggest to run fsck with verbose logging for every > > > failing OSD. Relevant command line: > > > > > > CEPH_ARGS="--log-file osd.N.log --debug-bluestore 5/20" > > > bin/ceph-bluestore-tool --path <path-to-osd> --command fsck > > > > > > Unlikely this will fix anything it's rather a way to collect logs to get > > > better insight. > > > > > > > > > Additionally you might want to run similar fsck for a couple of healthy OSDs > > > - curious if it succeeds as I have a feeling that the problem with crashing > > > OSDs had been hidden before the upgrade and revealed rather than caused by > > > it. > > > > > > > > > Thanks, > > > > > > Igor > > > > > > On 12/29/2023 3:28 PM, Jan Marek wrote: > > > > Hello Igor, > > > > > > > > I'm attaching a part of syslog creating while starting OSD.0. > > > > > > > > Many thanks for help. > > > > > > > > Sincerely > > > > Jan Marek > > > > > > > > Dne St, pro 27, 2023 at 04:42:56 CET napsal(a) Igor Fedotov: > > > > > Hi Jan, > > > > > > > > > > IIUC the attached log is for ceph-kvstore-tool, right? > > > > > > > > > > Can you please share full OSD startup log as well? > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Igor > > > > > > > > > > On 12/27/2023 4:30 PM, Jan Marek wrote: > > > > > > Hello, > > > > > > > > > > > > I've problem: my ceph cluster (3x mon nodes, 6x osd nodes, every > > > > > > osd node have 12 rotational disk and one NVMe device for > > > > > > bluestore DB). CEPH is installed by ceph orchestrator and have > > > > > > bluefs storage on osd. > > > > > > > > > > > > I've started process upgrade from version 17.2.6 to 18.2.1 by > > > > > > invocating: > > > > > > > > > > > > ceph orch upgrade start --ceph-version 18.2.1 > > > > > > > > > > > > After upgrade of mon and mgr processes orchestrator tried to > > > > > > upgrade the first OSD node, but they are falling down. > > > > > > > > > > > > I've stop the process of upgrade, but I have 1 osd node > > > > > > completely down. > > > > > > > > > > > > After upgrade I've got some error messages and I've found > > > > > > /var/lib/ceph/crashxxxx directories, I attach to this message > > > > > > files, which I've found here. > > > > > > > > > > > > Please, can you advice, what now I can do? It seems, that rocksdb > > > > > > is even non-compatible or corrupted :-( > > > > > > > > > > > > Thanks in advance. > > > > > > > > > > > > Sincerely > > > > > > Jan Marek > > > > > > > > > > > > _______________________________________________ > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > -- > > > > > Igor Fedotov > > > > > Ceph Lead Developer > > > > > > > > > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > > > > > > > > > croit GmbH, Freseniusstr. 31h, 81247 Munich > > > > > CEO: Martin Verges - VAT-ID: DE310638492 > > > > > Com. register: Amtsgericht Munich HRB 231263 > > > > > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > > > > > > > > -- > > > Igor Fedotov > > > Ceph Lead Developer > > > > > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > > > > > croit GmbH, Freseniusstr. 31h, 81247 Munich > > > CEO: Martin Verges - VAT-ID: DE310638492 > > > Com. register: Amtsgericht Munich HRB 231263 > > > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > -- > Igor Fedotov > Ceph Lead Developer > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Ing. Jan Marek University of South Bohemia Academic Computer Centre Phone: +420389032080 http://www.gnu.org/philosophy/no-word-attachments.cs.html
Attachment:
Screenshot_20240104_142222-red.png
Description: PNG image
Attachment:
detection.json
Description: application/json
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx