Hi Jan,
indeed fsck logs for the OSDs other than osd.0 look good so it would be
interesting to see OSD startup logs for them. Preferably to have that
for multiple (e.g. 3-4) OSDs to get the pattern.
Original upgrade log(s) would be nice to see as well.
You might want to use Google Drive or any other publicly available file
sharing site for that.
Thanks,
Igor
On 05/01/2024 10:25, Jan Marek wrote:
Hi Igor,
I've tried to start only osd.1, which seems to be fsck'd OK, but
it crashed :-(
I search logs and I've found, that I have logs from 22.12.2023,
when I've did a upgrade (I have set logging to journald).
Would you be interested in those logs? This file have 30MB in
bzip2 format, how I can share it with you?
It contains crash log from start osd.1 too, but I can cut out
from it and send it to list...
Sincerely
Jan Marek
Dne Čt, led 04, 2024 at 02:43:48 CET napsal(a) Jan Marek:
Hi Igor,
I've ran this oneliner:
for i in {0..12}; do export CEPH_ARGS="--log-file osd."${i}".log --debug-bluestore 5/20" ; ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.${i} --command fsck ; done;
On osd.0 it crashed very quickly, on osd.1 it is still working.
I've send those logs in one e-mail.
But!
I've tried to list disk devices in monitor view, and I've got
very interesting screenshot - some part I've emphasized by red
rectangulars.
I've got a json from syslog, which was as a part cephadm call,
where it seems to be correct (for my eyes).
Can be this coincidence for this problem?
Sincerely
Jan Marek
Dne Čt, led 04, 2024 at 12:32:47 CET napsal(a) Igor Fedotov:
Hi Jan,
may I see the fsck logs from all the failing OSDs to see the pattern. IIUC
the full node is suffering from the issue, right?
Thanks,
Igor
On 1/2/2024 10:53 AM, Jan Marek wrote:
Hello once again,
I've tried this:
export CEPH_ARGS="--log-file /tmp/osd.0.log --debug-bluestore 5/20"
ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.0 --command fsck
And I've sending /tmp/osd.0.log file attached.
Sincerely
Jan Marek
Dne Ne, pro 31, 2023 at 12:38:13 CET napsal(a) Igor Fedotov:
Hi Jan,
this doesn't look like RocksDB corruption but rather like some BlueStore
metadata inconsistency. Also assertion backtrace in the new log looks
completely different from the original one. So in an attempt to find any
systematic pattern I'd suggest to run fsck with verbose logging for every
failing OSD. Relevant command line:
CEPH_ARGS="--log-file osd.N.log --debug-bluestore 5/20"
bin/ceph-bluestore-tool --path <path-to-osd> --command fsck
Unlikely this will fix anything it's rather a way to collect logs to get
better insight.
Additionally you might want to run similar fsck for a couple of healthy OSDs
- curious if it succeeds as I have a feeling that the problem with crashing
OSDs had been hidden before the upgrade and revealed rather than caused by
it.
Thanks,
Igor
On 12/29/2023 3:28 PM, Jan Marek wrote:
Hello Igor,
I'm attaching a part of syslog creating while starting OSD.0.
Many thanks for help.
Sincerely
Jan Marek
Dne St, pro 27, 2023 at 04:42:56 CET napsal(a) Igor Fedotov:
Hi Jan,
IIUC the attached log is for ceph-kvstore-tool, right?
Can you please share full OSD startup log as well?
Thanks,
Igor
On 12/27/2023 4:30 PM, Jan Marek wrote:
Hello,
I've problem: my ceph cluster (3x mon nodes, 6x osd nodes, every
osd node have 12 rotational disk and one NVMe device for
bluestore DB). CEPH is installed by ceph orchestrator and have
bluefs storage on osd.
I've started process upgrade from version 17.2.6 to 18.2.1 by
invocating:
ceph orch upgrade start --ceph-version 18.2.1
After upgrade of mon and mgr processes orchestrator tried to
upgrade the first OSD node, but they are falling down.
I've stop the process of upgrade, but I have 1 osd node
completely down.
After upgrade I've got some error messages and I've found
/var/lib/ceph/crashxxxx directories, I attach to this message
files, which I've found here.
Please, can you advice, what now I can do? It seems, that rocksdb
is even non-compatible or corrupted :-(
Thanks in advance.
Sincerely
Jan Marek
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Ing. Jan Marek
University of South Bohemia
Academic Computer Centre
Phone: +420389032080
http://www.gnu.org/philosophy/no-word-attachments.cs.html
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx