Re: Stuck in upgrade process to reef

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jan,

w.r.t. osd.0 - if this is the only occurrence then I'd propose simply redeploy the OSD. This looks like some BlueStore metadata inconsistency which could occur long before the upgrade. Likely the upgrade just revealed the issue.  And honestly I can hardly imagine how to investigate it at this point.

Let's see how further upgrades go and come back to this question if more similar issues pop up.

Meanwhile I'd recommend to run fsck for every OSD prior to upgrade to get clear understanding if metadata is consistent or not.

This way - if occurred once again - we can prove/disprove my statement about the issue being unrelated to upgrades above.


Thanks,

Igor

On 17/01/2024 15:07, Jan Marek wrote:
Hi Igor,

many thanks for advice!

I've tried to start osd.1 and it started already, now it's
resynchronizing data.

I will start daemons one-by-one.

What do you mean about osd.0, which have a problem with
bluestore fsck? Is there a way to repair it?

Sincerely
Jan


Dne Út, led 16, 2024 at 08:15:03 CET napsal(a) Igor Fedotov:
Hi Jan,

I've just fired an upstream ticket for your case, see
https://tracker.ceph.com/issues/64053 for more details.


You might want to tune (or preferably just remove) your custom
bluestore_cache_.*_ratio settings to fix the issue.

This is reproducible and fixable in my lab this way.

Hope this helps.


Thanks,

Igor


On 15/01/2024 12:54, Jan Marek wrote:
Hi Igor,

I've tried to start ceph-sod daemon as you advice me and I'm
sending log osd.1.start.log

About memory: According to 'top' podman ceph daemon don't reach
2% of whole server memory (64GB)...

I have switch on autotune of memory...

My ceph config dump - see attached dump.txt

Sincerely
Jan Marek

Dne Čt, led 11, 2024 at 04:02:02 CET napsal(a) Igor Fedotov:
Hi Jan,

unfortunately this wasn't very helpful. Moreover the log looks a bit messy -
looks like a mixture of outputs from multiple running instances or
something. I'm not an expert in using containerized setups though.

Could you please simplify things by running ceph-osd process manually like
you did for ceph-objectstore-tool. And enforce log output to a file. Command
line should look somewhat the following:

ceph-osd -i 0 --log-to-file --log-file <some-file> --debug-bluestore 5/20
--debug-prioritycache 10

Please don't forget to run repair prior to that.


Also you haven't answered my questions about custom [memory] settings and
RAM usage during OSD startup. It would be nice to hear some feedback.


Thanks,

Igor

On 11/01/2024 16:47, Jan Marek wrote:
Hi Igor,

I've tried to start osd.1 with debug_prioritycache and
debug_bluestore 5/20, see attached file...

Sincerely
Jan

Dne St, led 10, 2024 at 01:03:07 CET napsal(a) Igor Fedotov:
Hi Jan,

indeed this looks like some memory allocation problem - may be OSD's RAM
usage threshold reached or something?

Curious if you have any custom OSD settings or may be any memory caps for
Ceph containers?

Could you please set debug_bluestore to 5/20 and debug_prioritycache to 10
and try to start OSD once again. Please monitor process RAM usage along the
process and share the resulting log.


Thanks,

Igor

On 10/01/2024 11:20, Jan Marek wrote:
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux