Re: Bug with Cephadm module osd service preventing orchestrator start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just to get some background information, did you remove OSDs while performing the upgrade? Or did you start OSD removal and then started the upgrade? Upgrades should be started with a healthy cluster, but one can’t guarantee that of course, OSDs and/or entire hosts can obviously also fail during an upgrade. Just trying to understand what could cause this (I haven’t upgraded production clusters to Reef yet, only test clusters). Have you stopped the upgrade to cancel the process entirely? Can you share this information please:

ceph versions
ceph orch upgrade status

Zitat von Benjamin Huth <benjaminmhuth@xxxxxxxxx>:

Just wanted to follow up on this, I am unfortunately still stuck with this
and can't find where the json for this value is stored. I'm wondering if I
should attempt to build a manager container  with the code for this
reverted to before the commit that introduced the original_weight argument.
Please let me know if you guys have any thoughts

Thank you!

On Wed, Aug 14, 2024, 7:37 PM Benjamin Huth <benjaminmhuth@xxxxxxxxx> wrote:

Hey there, so I went to upgrade my ceph from 18.2.2 to 18.2.4 and have
encountered a problem with my managers. After they had been upgraded, my
ceph orch module broke because the cephadm module would not load. This
obviously halted the update because you can't really update without the
orchestrator. Here are the logs related to why the cephadm module fails to
start:

https://pastebin.com/SzHbEDVA

and the relevent part here:

"backtrace": [

" File \\"/usr/share/ceph/mgr/cephadm/module.py\\", line 591, in
__init__\\n self.to_remove_osds.load_from_store()",

" File \\"/usr/share/ceph/mgr/cephadm/services/osd.py\\", line 918, in
load_from_store\\n osd_obj = OSD.from_json(osd, rm_util=self.rm_util)",

" File \\"/usr/share/ceph/mgr/cephadm/services/osd.py\\", line 783, in
from_json\\n return cls(**inp)",

"TypeError: __init__() got an unexpected keyword argument
'original_weight'"

]

Unfortunately, I am at a loss to what passes this the original weight
argument. I have attempted to migrate back to 18.2.2 and successfully
redeployed a manager of that version, but it also has the same issue with
the cephadm module. I believe this may be because I recently started
several OSD drains, then canceled them, causing this to manifest once the
managers restarted.

I went through a good bit of the source and found the module at fault:

https://github.com/ceph/ceph/blob/e0dd396793b679922e487332a2a4bc48e024a42f/src/pybind/mgr/cephadm/services/osd.py#L779

as well as the PR that caused the issue:

https://github.com/ceph/ceph/commit/ba7fac074fb5ad072fcad10862f75c0a26a7591d

I unfortunately am not familiar enough with the ceph source to find the
ceph-config values I need to delete or smart enough to fix this myself. Any
help would be super appreciated.

Thanks!

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux