Cephadm only scheduling, not orchestrating daemons

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

As you might know, I have a problem with MDS not starting. During the investigation with your help I found another issue that might be related.

I can plan to restart, redeploy, reconfigure services via cephadm or dashboard just as I want but services won't react. I only see the action to be scheduled but that's all.

2023-04-13T17:11:15.690698+0000 mgr.ceph04.qaexpv (mgr.74906907) 37184 : cephadm [INF] Schedule redeploy daemon mds.mds01.ceph05.pqxmvt 2023-04-13T17:11:20.746743+0000 mgr.ceph04.qaexpv (mgr.74906907) 37190 : cephadm [INF] Schedule redeploy daemon mds.mds01.ceph06.rrxmks 2023-04-13T17:11:24.971226+0000 mgr.ceph04.qaexpv (mgr.74906907) 37195 : cephadm [INF] Schedule redeploy daemon mds.mds01.ceph07.omdisd


It's the same for other daemons/services. I changed placement rules, scheduled changes, failed mgr, even rebooted hosts. I even was desperate enough to delete files for services from hosts before rebooting hoping I could trigger a manual redeploy.

All I see are the same MDS stuck in "error" state. I removed them via "ceph orch rm" but they are still there. When I reissue the command it fails saying that the service isn't there.

"ceph orch ps" still lists them.

mds.mds01.ceph03.xqwdjy ceph03 error 2d ago 2M - - <unknown> <unknown> <unknown> mds.mds01.ceph04.hcmvae ceph04 error 2d ago 2d - - <unknown> <unknown> <unknown> mds.mds01.ceph05.pqxmvt ceph05 error 2d ago 10M - - <unknown> <unknown> <unknown> mds.mds01.ceph06.rrxmks ceph06 error 2d ago 10w - - <unknown> <unknown> <unknown> mds.mds01.ceph07.omdisd ceph07 error 2d ago 3M - - <unknown> <unknown> <unknown>

Any idea how I can get rid of them? Or redeploy them?

Additionally I'm just in the middle of an upgrade.

{
"target_image": "quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635",
    "in_progress": true,
    "which": "Upgrading all daemon types on all hosts",
    "services_complete": [
        "crash",
        "mgr",
	"mon",
	"osd"
    ],
    "progress": "18/40 daemons upgraded",
    "message": "Upgrade paused",
    "is_paused": true
}


I paused it on purpose to allow for manipulation of daemons.

Cheers,
Thomas
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux