Re: Stuck in upgrade process to reef

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Igor,

I've tried to start only osd.1, which seems to be fsck'd OK, but
it crashed :-(

I search logs and I've found, that I have logs from 22.12.2023,
when I've did a upgrade (I have set logging to journald).

Would you be interested in those logs? This file have 30MB in
bzip2 format, how I can share it with you?

It contains crash log from start osd.1 too, but I can cut out
from it and send it to list...

Sincerely
Jan Marek

Dne Čt, led 04, 2024 at 02:43:48 CET napsal(a) Jan Marek:
> Hi Igor,
> 
> I've ran this oneliner:
> 
> for i in {0..12}; do export CEPH_ARGS="--log-file osd."${i}".log --debug-bluestore 5/20" ; ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.${i} --command fsck ; done;
> 
> On osd.0 it crashed very quickly, on osd.1 it is still working.
> 
> I've send those logs in one e-mail.
> 
> But!
> 
> I've tried to list disk devices in monitor view, and I've got
> very interesting screenshot - some part I've emphasized by red
> rectangulars.
> 
> I've got a json from syslog, which was as a part cephadm call,
> where it seems to be correct (for my eyes).
> 
> Can be this coincidence for this problem?
> 
> Sincerely
> Jan Marek
> 
> Dne Čt, led 04, 2024 at 12:32:47 CET napsal(a) Igor Fedotov:
> > Hi Jan,
> > 
> > may I see the fsck logs from all the failing OSDs to see the pattern. IIUC
> > the full node is suffering from the issue, right?
> > 
> > 
> > Thanks,
> > 
> > Igor
> > 
> > On 1/2/2024 10:53 AM, Jan Marek wrote:
> > > Hello once again,
> > > 
> > > I've tried this:
> > > 
> > > export CEPH_ARGS="--log-file /tmp/osd.0.log --debug-bluestore 5/20"
> > > ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.0 --command fsck
> > > 
> > > And I've sending /tmp/osd.0.log file attached.
> > > 
> > > Sincerely
> > > Jan Marek
> > > 
> > > Dne Ne, pro 31, 2023 at 12:38:13 CET napsal(a) Igor Fedotov:
> > > > Hi Jan,
> > > > 
> > > > this doesn't look like RocksDB corruption but rather like some BlueStore
> > > > metadata inconsistency. Also assertion backtrace in the new log looks
> > > > completely different from the original one. So in an attempt to find any
> > > > systematic pattern I'd suggest to run fsck with verbose logging for every
> > > > failing OSD. Relevant command line:
> > > > 
> > > > CEPH_ARGS="--log-file osd.N.log --debug-bluestore 5/20"
> > > > bin/ceph-bluestore-tool --path <path-to-osd> --command fsck
> > > > 
> > > > Unlikely this will fix anything it's rather a way to collect logs to get
> > > > better insight.
> > > > 
> > > > 
> > > > Additionally you might want to run similar fsck for a couple of healthy OSDs
> > > > - curious if it succeeds as I have a feeling that the problem with crashing
> > > > OSDs had been hidden before the upgrade and revealed rather than caused by
> > > > it.
> > > > 
> > > > 
> > > > Thanks,
> > > > 
> > > > Igor
> > > > 
> > > > On 12/29/2023 3:28 PM, Jan Marek wrote:
> > > > > Hello Igor,
> > > > > 
> > > > > I'm attaching a part of syslog creating while starting OSD.0.
> > > > > 
> > > > > Many thanks for help.
> > > > > 
> > > > > Sincerely
> > > > > Jan Marek
> > > > > 
> > > > > Dne St, pro 27, 2023 at 04:42:56 CET napsal(a) Igor Fedotov:
> > > > > > Hi Jan,
> > > > > > 
> > > > > > IIUC the attached log is for ceph-kvstore-tool, right?
> > > > > > 
> > > > > > Can you please share full OSD startup log as well?
> > > > > > 
> > > > > > 
> > > > > > Thanks,
> > > > > > 
> > > > > > Igor
> > > > > > 
> > > > > > On 12/27/2023 4:30 PM, Jan Marek wrote:
> > > > > > > Hello,
> > > > > > > 
> > > > > > > I've problem: my ceph cluster (3x mon nodes, 6x osd nodes, every
> > > > > > > osd node have 12 rotational disk and one NVMe device for
> > > > > > > bluestore DB). CEPH is installed by ceph orchestrator and have
> > > > > > > bluefs storage on osd.
> > > > > > > 
> > > > > > > I've started process upgrade from version 17.2.6 to 18.2.1 by
> > > > > > > invocating:
> > > > > > > 
> > > > > > > ceph orch upgrade start --ceph-version 18.2.1
> > > > > > > 
> > > > > > > After upgrade of mon and mgr processes orchestrator tried to
> > > > > > > upgrade the first OSD node, but they are falling down.
> > > > > > > 
> > > > > > > I've stop the process of upgrade, but I have 1 osd node
> > > > > > > completely down.
> > > > > > > 
> > > > > > > After upgrade I've got some error messages and I've found
> > > > > > > /var/lib/ceph/crashxxxx directories, I attach to this message
> > > > > > > files, which I've found here.
> > > > > > > 
> > > > > > > Please, can you advice, what now I can do? It seems, that rocksdb
> > > > > > > is even non-compatible or corrupted :-(
> > > > > > > 
> > > > > > > Thanks in advance.
> > > > > > > 
> > > > > > > Sincerely
> > > > > > > Jan Marek
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > > > -- 
> > > > > > Igor Fedotov
> > > > > > Ceph Lead Developer
> > > > > > 
> > > > > > Looking for help with your Ceph cluster? Contact us at https://croit.io
> > > > > > 
> > > > > > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > > > > > CEO: Martin Verges - VAT-ID: DE310638492
> > > > > > Com. register: Amtsgericht Munich HRB 231263
> > > > > > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> > > > > > 
> > > > -- 
> > > > Igor Fedotov
> > > > Ceph Lead Developer
> > > > 
> > > > Looking for help with your Ceph cluster? Contact us at https://croit.io
> > > > 
> > > > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > > > CEO: Martin Verges - VAT-ID: DE310638492
> > > > Com. register: Amtsgericht Munich HRB 231263
> > > > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > 
> > -- 
> > Igor Fedotov
> > Ceph Lead Developer
> > 
> > Looking for help with your Ceph cluster? Contact us at https://croit.io
> > 
> > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > CEO: Martin Verges - VAT-ID: DE310638492
> > Com. register: Amtsgericht Munich HRB 231263
> > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> -- 
> Ing. Jan Marek
> University of South Bohemia
> Academic Computer Centre
> Phone: +420389032080
> http://www.gnu.org/philosophy/no-word-attachments.cs.html






> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx


-- 
Ing. Jan Marek
University of South Bohemia
Academic Computer Centre
Phone: +420389032080
http://www.gnu.org/philosophy/no-word-attachments.cs.html

Attachment: signature.asc
Description: PGP signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux