Hi,
I would probably stop the upgrade to continue, this might be blocking
cephadm. Then try again to redeploy a daemon, if it still fails check
the cephadm.log(s) on the respective servers as well as the active mgr
log.
Regards,
Eugen
Zitat von Thomas Widhalm <widhalmt@xxxxxxxxxxxxx>:
Hi,
As you might know, I have a problem with MDS not starting. During
the investigation with your help I found another issue that might be
related.
I can plan to restart, redeploy, reconfigure services via cephadm or
dashboard just as I want but services won't react. I only see the
action to be scheduled but that's all.
2023-04-13T17:11:15.690698+0000 mgr.ceph04.qaexpv (mgr.74906907)
37184 : cephadm [INF] Schedule redeploy daemon mds.mds01.ceph05.pqxmvt
2023-04-13T17:11:20.746743+0000 mgr.ceph04.qaexpv (mgr.74906907)
37190 : cephadm [INF] Schedule redeploy daemon mds.mds01.ceph06.rrxmks
2023-04-13T17:11:24.971226+0000 mgr.ceph04.qaexpv (mgr.74906907)
37195 : cephadm [INF] Schedule redeploy daemon mds.mds01.ceph07.omdisd
It's the same for other daemons/services. I changed placement rules,
scheduled changes, failed mgr, even rebooted hosts. I even was
desperate enough to delete files for services from hosts before
rebooting hoping I could trigger a manual redeploy.
All I see are the same MDS stuck in "error" state. I removed them
via "ceph orch rm" but they are still there. When I reissue the
command it fails saying that the service isn't there.
"ceph orch ps" still lists them.
mds.mds01.ceph03.xqwdjy ceph03 error 2d
ago 2M - - <unknown> <unknown> <unknown>
mds.mds01.ceph04.hcmvae ceph04 error 2d
ago 2d - - <unknown> <unknown> <unknown>
mds.mds01.ceph05.pqxmvt ceph05 error 2d
ago 10M - - <unknown> <unknown> <unknown>
mds.mds01.ceph06.rrxmks ceph06 error 2d
ago 10w - - <unknown> <unknown> <unknown>
mds.mds01.ceph07.omdisd ceph07 error 2d
ago 3M - - <unknown> <unknown> <unknown>
Any idea how I can get rid of them? Or redeploy them?
Additionally I'm just in the middle of an upgrade.
{
"target_image":
"quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [
"crash",
"mgr",
"mon",
"osd"
],
"progress": "18/40 daemons upgraded",
"message": "Upgrade paused",
"is_paused": true
}
I paused it on purpose to allow for manipulation of daemons.
Cheers,
Thomas
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx