Re: Orchestration seems not to work

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

What I'm seeing a lot is this: "[stats WARNING root] cmdtag not found in client metadata" Can't make anything of it but I guess it's not showing the initial issue.

Now that I think of it - I started the cluster with 3 nodes which are now only used as OSD. Could it be there's something missing on the new nodes that are now used as mgr/mon?

Cheers,
Thomas

On 04.05.23 14:48, Eugen Block wrote:
Hi,

try setting debug logs for the mgr:

ceph config set mgr mgr/cephadm/log_level debug

This should provide more details what the mgr is trying and where it's failing, hopefully. Last week this helped to identify an issue between a lower pacific issue for me. Do you see anything in the cephadm.log pointing to the mgr actually trying something?


Zitat von Thomas Widhalm <widhalmt@xxxxxxxxxxxxx>:

Hi,

I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but the following problem existed when I was still everywhere on 17.2.5 .

I had a major issue in my cluster which could be solved with a lot of your help and even more trial and error. Right now it seems that most is already fixed but I can't rule out that there's still some problem hidden. The very issue I'm asking about started during the repair.

When I want to orchestrate the cluster, it logs the command but it doesn't do anything. No matter if I use ceph dashboard or "ceph orch" in "cephadm shell". I don't get any error message when I try to deploy new services, redeploy them etc. The log only says "scheduled" and that's it. Same when I change placement rules. Usually I use tags. But since they don't work anymore, too, I tried host and umanaged. No success. The only way I can actually start and stop containers is via systemctl from the host itself.

When I run "ceph orch ls" or "ceph orch ps" I see services I deployed for testing being deleted (for weeks now). Ans especially a lot of old MDS are listed as "error" or "starting". The list doesn't match reality at all because I had to start them by hand.

I tried "ceph mgr fail" and even a complete shutdown of the whole cluster with all nodes including all mgs, mds even osd - everything during a maintenance window. Didn't change anything.

Could you help me? To be honest I'm still rather new to Ceph and since I didn't find anything in the logs that caught my eye I would be thankful for hints how to debug.

Cheers,
Thomas
--
http://www.widhalm.or.at
GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widhalmt@xxxxxxxxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux