On 12-11-2024 09:29, Eugen Block wrote:
Hi Torkil,
Hi Eugen
this sounds suspiciously like https://tracker.ceph.com/issues/67329
Do you have the same (or similar) stack trace in the mgr log pointing to
osd_remove_queue? You seem to have removed some OSDs, that would fit the
description as well...
Indeed, had just put a host into drain and there's this in the log:
"
2024-11-12T08:10:48.390+0000 7f1b2e088640 -1 mgr load Failed to
construct class in 'cephadm'
2024-11-12T08:10:48.390+0000 7f1b2e088640 -1 mgr load Traceback (most
recent call last):
File "/usr/share/ceph/mgr/cephadm/module.py", line 619, in __init__
self.to_remove_osds.load_from_store()
File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 924, in
load_from_store
osd_obj = OSD.from_json(osd, rm_util=self.rm_util)
File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 789, in
from_json
return cls(**inp)
TypeError: __init__() got an unexpected keyword argument 'original_weight'
2024-11-12T08:10:48.392+0000 7f1b2e088640 -1 mgr operator() Failed to
run module in active mode ('cephadm')
"
It's not clear to me from the tracker how to recover though. The issue
seems to be resolved, so should I be able to just pull new container
images somehow?
Mvh.
Torkil
Regards,
Eugen
Zitat von Torkil Svensgaard <torkil@xxxxxxxx>:
Hi
18.2.4.
After failing over the active manager ceph orch commands seems to have
stopped working. There's this in the mgr log:
"
2024-11-12T08:16:30.136+0000 7f1b2d887640 0 log_channel(audit) log
[DBG] : from='client.2088861125 -' entity='client.admin'
cmd=[{"prefix": "orch osd rm status", "target": ["mon-mgr", ""]}]:
dispatch
2024-11-12T08:16:30.136+0000 7f1b23cf4640 -1 no module 'cephadm'
2024-11-12T08:16:30.136+0000 7f1b23cf4640 -1 no module 'cephadm'
2024-11-12T08:16:30.136+0000 7f1b23cf4640 -1 mgr.server reply reply
(2) No such file or directory Module not found
"
The module is still enabled:
"
[root@ceph-flash1 ~]# ceph mgr module ls
MODULE
balancer on (always on)
crash on (always on)
devicehealth on (always on)
orchestrator on (always on)
pg_autoscaler on (always on)
progress on (always on)
rbd_support on (always on)
status on (always on)
telemetry on (always on)
volumes on (always on)
alerts on
cephadm on
dashboard on
insights on
iostat on
nfs on
prometheus on
stats on
diskprediction_local -
influx -
k8sevents -
localpool -
mds_autoscaler -
mirroring -
osd_perf_query -
osd_support -
restful -
rgw -
rook -
selftest -
snap_schedule -
telegraf -
test_orchestrator -
zabbix -
"
Cluster is working:
"
[root@ceph-flash1 ~]# ceph -s
cluster:
id: 8ee2d228-ed21-4580-8bbf-0649f229e21d
health: HEALTH_WARN
noout flag(s) set
5 nearfull osd(s)
Degraded data redundancy: 22742466/3557557778 objects
degraded (0.639%), 559 pgs degraded, 559 pgs undersized
4 pool(s) nearfull
services:
mon: 5 daemons, quorum
ceph-flash1,ceph-flash2,ceph-flash3,grouchy,klutzy (age 8d)
mgr: ceph-flash2.utlhuz(active, since 10m), standbys:
ceph-flash3.ciudre, ceph-flash1.erhakb
mds: 1/1 daemons up, 2 standby
osd: 567 osds: 555 up (since 14h), 555 in (since 4d); 2689
remapped pgs
flags noout
data:
volumes: 1/1 healthy
pools: 17 pools, 15521 pgs
objects: 619.72M objects, 1.3 PiB
usage: 2.3 PiB used, 2.0 PiB / 4.3 PiB avail
pgs: 22742466/3557557778 objects degraded (0.639%)
135987731/3557557778 objects misplaced (3.823%)
12832 active+clean
2111 active+remapped+backfill_wait
479 active+undersized+degraded+remapped+backfill_wait
80 active+undersized+degraded+remapped+backfilling
19 active+remapped+backfilling
io:
client: 73 MiB/s rd, 5.4 MiB/s wr, 574 op/s rd, 169 op/s wr
recovery: 2.9 GiB/s, 1.01k objects/s
"
Suggestions? I tried failing over the manager again which didn't help.
Mvh.
Torkil
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx