Hi Igor, I've tried to start only osd.1, which seems to be fsck'd OK, but it crashed :-( I search logs and I've found, that I have logs from 22.12.2023, when I've did a upgrade (I have set logging to journald). Would you be interested in those logs? This file have 30MB in bzip2 format, how I can share it with you? It contains crash log from start osd.1 too, but I can cut out from it and send it to list... Sincerely Jan Marek Dne Čt, led 04, 2024 at 02:43:48 CET napsal(a) Jan Marek: > Hi Igor, > > I've ran this oneliner: > > for i in {0..12}; do export CEPH_ARGS="--log-file osd."${i}".log --debug-bluestore 5/20" ; ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.${i} --command fsck ; done; > > On osd.0 it crashed very quickly, on osd.1 it is still working. > > I've send those logs in one e-mail. > > But! > > I've tried to list disk devices in monitor view, and I've got > very interesting screenshot - some part I've emphasized by red > rectangulars. > > I've got a json from syslog, which was as a part cephadm call, > where it seems to be correct (for my eyes). > > Can be this coincidence for this problem? > > Sincerely > Jan Marek > > Dne Čt, led 04, 2024 at 12:32:47 CET napsal(a) Igor Fedotov: > > Hi Jan, > > > > may I see the fsck logs from all the failing OSDs to see the pattern. IIUC > > the full node is suffering from the issue, right? > > > > > > Thanks, > > > > Igor > > > > On 1/2/2024 10:53 AM, Jan Marek wrote: > > > Hello once again, > > > > > > I've tried this: > > > > > > export CEPH_ARGS="--log-file /tmp/osd.0.log --debug-bluestore 5/20" > > > ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.0 --command fsck > > > > > > And I've sending /tmp/osd.0.log file attached. > > > > > > Sincerely > > > Jan Marek > > > > > > Dne Ne, pro 31, 2023 at 12:38:13 CET napsal(a) Igor Fedotov: > > > > Hi Jan, > > > > > > > > this doesn't look like RocksDB corruption but rather like some BlueStore > > > > metadata inconsistency. Also assertion backtrace in the new log looks > > > > completely different from the original one. So in an attempt to find any > > > > systematic pattern I'd suggest to run fsck with verbose logging for every > > > > failing OSD. Relevant command line: > > > > > > > > CEPH_ARGS="--log-file osd.N.log --debug-bluestore 5/20" > > > > bin/ceph-bluestore-tool --path <path-to-osd> --command fsck > > > > > > > > Unlikely this will fix anything it's rather a way to collect logs to get > > > > better insight. > > > > > > > > > > > > Additionally you might want to run similar fsck for a couple of healthy OSDs > > > > - curious if it succeeds as I have a feeling that the problem with crashing > > > > OSDs had been hidden before the upgrade and revealed rather than caused by > > > > it. > > > > > > > > > > > > Thanks, > > > > > > > > Igor > > > > > > > > On 12/29/2023 3:28 PM, Jan Marek wrote: > > > > > Hello Igor, > > > > > > > > > > I'm attaching a part of syslog creating while starting OSD.0. > > > > > > > > > > Many thanks for help. > > > > > > > > > > Sincerely > > > > > Jan Marek > > > > > > > > > > Dne St, pro 27, 2023 at 04:42:56 CET napsal(a) Igor Fedotov: > > > > > > Hi Jan, > > > > > > > > > > > > IIUC the attached log is for ceph-kvstore-tool, right? > > > > > > > > > > > > Can you please share full OSD startup log as well? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Igor > > > > > > > > > > > > On 12/27/2023 4:30 PM, Jan Marek wrote: > > > > > > > Hello, > > > > > > > > > > > > > > I've problem: my ceph cluster (3x mon nodes, 6x osd nodes, every > > > > > > > osd node have 12 rotational disk and one NVMe device for > > > > > > > bluestore DB). CEPH is installed by ceph orchestrator and have > > > > > > > bluefs storage on osd. > > > > > > > > > > > > > > I've started process upgrade from version 17.2.6 to 18.2.1 by > > > > > > > invocating: > > > > > > > > > > > > > > ceph orch upgrade start --ceph-version 18.2.1 > > > > > > > > > > > > > > After upgrade of mon and mgr processes orchestrator tried to > > > > > > > upgrade the first OSD node, but they are falling down. > > > > > > > > > > > > > > I've stop the process of upgrade, but I have 1 osd node > > > > > > > completely down. > > > > > > > > > > > > > > After upgrade I've got some error messages and I've found > > > > > > > /var/lib/ceph/crashxxxx directories, I attach to this message > > > > > > > files, which I've found here. > > > > > > > > > > > > > > Please, can you advice, what now I can do? It seems, that rocksdb > > > > > > > is even non-compatible or corrupted :-( > > > > > > > > > > > > > > Thanks in advance. > > > > > > > > > > > > > > Sincerely > > > > > > > Jan Marek > > > > > > > > > > > > > > _______________________________________________ > > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > -- > > > > > > Igor Fedotov > > > > > > Ceph Lead Developer > > > > > > > > > > > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > > > > > > > > > > > croit GmbH, Freseniusstr. 31h, 81247 Munich > > > > > > CEO: Martin Verges - VAT-ID: DE310638492 > > > > > > Com. register: Amtsgericht Munich HRB 231263 > > > > > > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > > > > > > > > > > -- > > > > Igor Fedotov > > > > Ceph Lead Developer > > > > > > > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > > > > > > > croit GmbH, Freseniusstr. 31h, 81247 Munich > > > > CEO: Martin Verges - VAT-ID: DE310638492 > > > > Com. register: Amtsgericht Munich HRB 231263 > > > > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > -- > > Igor Fedotov > > Ceph Lead Developer > > > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > > > croit GmbH, Freseniusstr. 31h, 81247 Munich > > CEO: Martin Verges - VAT-ID: DE310638492 > > Com. register: Amtsgericht Munich HRB 231263 > > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > -- > Ing. Jan Marek > University of South Bohemia > Academic Computer Centre > Phone: +420389032080 > http://www.gnu.org/philosophy/no-word-attachments.cs.html > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Ing. Jan Marek University of South Bohemia Academic Computer Centre Phone: +420389032080 http://www.gnu.org/philosophy/no-word-attachments.cs.html
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx