Re: Stuck in upgrade process to reef

Igor Fedotov <igor.fedotov@xxxxxxxx> · Tue, 9 Jan 2024 16:23:32 +0300

Hi Marek,

I haven't looked through those upgrade logs yet but here are some 
comments regarding last OSD startup attempt.

First of answering your question

_init_alloc::NCB::restore_allocator() failed! Run Full Recovery from ONodes (might take a while)

Is it a mandatory part of fsck?

This is caused by previous non-graceful OSD process shutdown. BlueStore is unable to find up-to-date allocation map and recovers it from RocksDB. And since fsck is a read-only procedure the recovered allocmap is not saved - hence all the following BlueStore startups (within fsck or OSD init) cause another rebuild attempt. To avoid that you might want to run repair instead of fsck - this will persist up-to-date allocation map and avoid its rebuilding on the next startup. This will work till the next non-graceful shutdown only - hence unsuccessful OSD attempt might break the allocmap state again.

Secondly - looking at OSD startup log one can see that actual OSD log ends with that allocmap recovery as well:

2024-01-09T11:25:30.718449+01:00 osd1 ceph-osd[1734062]: bluestore(/var/lib/ceph/osd/ceph-1) _init_alloc::NCB::restore_allocator() failed! Run Full Recovery from ONodes (might take a while) ...

Subsequent log line indicating OSD daemon termination is from systemd:
2024-01-09T11:25:33.516258+01:00 osd1 systemd[1]: Stopping ceph-2c565e24-7850-47dc-a751-a6357cbbaf2a@osd.1.service - Ceph osd.1 for 2c565e24-7850-47dc-a751-a6357cbbaf2a...

And honestly these lines provide almost no clue why termination happened. No obvious OSD failures or something are shown. Perhaps containerized environment hides the details e.g. by cutting off OSD log's tail.
So you might want to proceed the investigation by running repair prior to starting the OSD as per above. This will result in no alloc map recovery and hopefully workaround the problem during startup - if the issue is caused by allocmap recovery.
Additionally you might want to increase debug_bluestore log level for osd.1 before starting it up to get more insight on what's happening.

Alternatively you might want to play with OSD log target settings to write OSD.1 log to some file rather than using system wide logging infra - hopefully this will be more helpful.

Thanks,
Igor

On 09/01/2024 13:31, Jan Marek wrote:
Hi Igor,

I've sent you logs via filesender.cesnet.cz, if someone would
be interested, they are here:

https://filesender.cesnet.cz/?s=download&token=047b1ec4-4df0-4e8a-90fc-31706eb168a4

Some points:

1) I've found, that on the osd1 server was bad time (3 minutes in
future). I've corrected that. Yes, I know, that it's bad, but we
moved servers to any other net segment, where they have no access
to the timeservers in Internet, then I must reconfigure it to use
our own NTP servers.

2) I've tried to start osd.1 service by this sequence:

a)

ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.1 --command fsck

(without setting log properly :-( )

b)

export CEPH_ARGS="--log-file osd.1.log --debug-bluestore 5/20"
ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.1 --command fsck

- here I have one question: Why is it in this log stil this line:

_init_alloc::NCB::restore_allocator() failed! Run Full Recovery from ONodes (might take a while)

Is it a mandatory part of fsck?

Log is attached.

c)

systemctl start ceph-2c565e24-7850-47dc-a751-a6357cbbaf2a@osd.1.service

still crashing, gzip-ed log attached too.

Many thanks for exploring problem.

Sincerely
Jan Marek

Dne Po, led 08, 2024 at 12:00:05 CET napsal(a) Igor Fedotov:
Hi Jan,

indeed fsck logs for the OSDs other than osd.0 look good so it would be
interesting to see OSD startup logs for them. Preferably to have that for
multiple (e.g. 3-4) OSDs to get the pattern.

Original upgrade log(s) would be nice to see as well.

You might want to use Google Drive or any other publicly available file
sharing site for that.

Thanks,

Igor

On 05/01/2024 10:25, Jan Marek wrote:
Hi Igor,

I've tried to start only osd.1, which seems to be fsck'd OK, but
it crashed :-(

I search logs and I've found, that I have logs from 22.12.2023,
when I've did a upgrade (I have set logging to journald).

Would you be interested in those logs? This file have 30MB in
bzip2 format, how I can share it with you?

It contains crash log from start osd.1 too, but I can cut out
from it and send it to list...

Sincerely
Jan Marek

Dne Čt, led 04, 2024 at 02:43:48 CET napsal(a) Jan Marek:
Hi Igor,

I've ran this oneliner:

for i in {0..12}; do export CEPH_ARGS="--log-file osd."${i}".log --debug-bluestore 5/20" ; ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.${i} --command fsck ; done;

On osd.0 it crashed very quickly, on osd.1 it is still working.

I've send those logs in one e-mail.

But!

I've tried to list disk devices in monitor view, and I've got
very interesting screenshot - some part I've emphasized by red
rectangulars.

I've got a json from syslog, which was as a part cephadm call,
where it seems to be correct (for my eyes).

Can be this coincidence for this problem?

Sincerely
Jan Marek

Dne Čt, led 04, 2024 at 12:32:47 CET napsal(a) Igor Fedotov:
Hi Jan,

may I see the fsck logs from all the failing OSDs to see the pattern. IIUC
the full node is suffering from the issue, right?

Thanks,

Igor

On 1/2/2024 10:53 AM, Jan Marek wrote:
Hello once again,

I've tried this:

export CEPH_ARGS="--log-file /tmp/osd.0.log --debug-bluestore 5/20"
ceph-bluestore-tool --path /var/lib/ceph/2c565e24-7850-47dc-a751-a6357cbbaf2a/osd.0 --command fsck

And I've sending /tmp/osd.0.log file attached.

Sincerely
Jan Marek

Dne Ne, pro 31, 2023 at 12:38:13 CET napsal(a) Igor Fedotov:
Hi Jan,

this doesn't look like RocksDB corruption but rather like some BlueStore
metadata inconsistency. Also assertion backtrace in the new log looks
completely different from the original one. So in an attempt to find any
systematic pattern I'd suggest to run fsck with verbose logging for every
failing OSD. Relevant command line:

CEPH_ARGS="--log-file osd.N.log --debug-bluestore 5/20"
bin/ceph-bluestore-tool --path <path-to-osd> --command fsck

Unlikely this will fix anything it's rather a way to collect logs to get
better insight.

Additionally you might want to run similar fsck for a couple of healthy OSDs
- curious if it succeeds as I have a feeling that the problem with crashing
OSDs had been hidden before the upgrade and revealed rather than caused by
it.

Thanks,

Igor

On 12/29/2023 3:28 PM, Jan Marek wrote:
Hello Igor,

I'm attaching a part of syslog creating while starting OSD.0.

Many thanks for help.

Sincerely
Jan Marek

Dne St, pro 27, 2023 at 04:42:56 CET napsal(a) Igor Fedotov:
Hi Jan,

IIUC the attached log is for ceph-kvstore-tool, right?

Can you please share full OSD startup log as well?

Thanks,

Igor

On 12/27/2023 4:30 PM, Jan Marek wrote:
Hello,

I've problem: my ceph cluster (3x mon nodes, 6x osd nodes, every
osd node have 12 rotational disk and one NVMe device for
bluestore DB). CEPH is installed by ceph orchestrator and have
bluefs storage on osd.

I've started process upgrade from version 17.2.6 to 18.2.1 by
invocating:

ceph orch upgrade start --ceph-version 18.2.1

After upgrade of mon and mgr processes orchestrator tried to
upgrade the first OSD node, but they are falling down.

I've stop the process of upgrade, but I have 1 osd node
completely down.

After upgrade I've got some error messages and I've found
/var/lib/ceph/crashxxxx directories, I attach to this message
files, which I've found here.

Please, can you advice, what now I can do? It seems, that rocksdb
is even non-compatible or corrupted :-(

Thanks in advance.

Sincerely
Jan Marek

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Ing. Jan Marek
University of South Bohemia
Academic Computer Centre
Phone: +420389032080
http://www.gnu.org/philosophy/no-word-attachments.cs.html

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx