Re: Orchestration seems not to work

Adam King <adking@xxxxxxxxxx> · Thu, 4 May 2023 10:55:48 -0400

what does specifically `ceph log last 200 debug cephadm` spit out? The log
lines you've posted so far I don't think are generated by the orchestrator
so curious what the last actions it took was (and how long ago).

On Thu, May 4, 2023 at 10:35 AM Thomas Widhalm <widhalmt@xxxxxxxxxxxxx>
wrote:

> To completely rule out hung processes, I managed to get another short
> shutdown.
>
> Now I'm seeing lots of:
>
> mgr.server handle_open ignoring open from mds.mds01.ceph01.usujbi
> v2:192.168.23.61:6800/2922006253; not ready for session (expect reconnect)
> mgr finish mon failed to return metadata for mds.mds01.ceph02.otvipq:
> (2) No such file or directory
>
> log lines. Seems like it now realises that some of these informations
> are stale. But it looks like it's just waiting for it to come back and
> not do anything about it.
>
> On 04.05.23 14:48, Eugen Block wrote:
> > Hi,
> >
> > try setting debug logs for the mgr:
> >
> > ceph config set mgr mgr/cephadm/log_level debug
> >
> > This should provide more details what the mgr is trying and where it's
> > failing, hopefully. Last week this helped to identify an issue between a
> > lower pacific issue for me.
> > Do you see anything in the cephadm.log pointing to the mgr actually
> > trying something?
> >
> >
> > Zitat von Thomas Widhalm <widhalmt@xxxxxxxxxxxxx>:
> >
> >> Hi,
> >>
> >> I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but
> >> the following problem existed when I was still everywhere on 17.2.5 .
> >>
> >> I had a major issue in my cluster which could be solved with a lot of
> >> your help and even more trial and error. Right now it seems that most
> >> is already fixed but I can't rule out that there's still some problem
> >> hidden. The very issue I'm asking about started during the repair.
> >>
> >> When I want to orchestrate the cluster, it logs the command but it
> >> doesn't do anything. No matter if I use ceph dashboard or "ceph orch"
> >> in "cephadm shell". I don't get any error message when I try to deploy
> >> new services, redeploy them etc. The log only says "scheduled" and
> >> that's it. Same when I change placement rules. Usually I use tags. But
> >> since they don't work anymore, too, I tried host and umanaged. No
> >> success. The only way I can actually start and stop containers is via
> >> systemctl from the host itself.
> >>
> >> When I run "ceph orch ls" or "ceph orch ps" I see services I deployed
> >> for testing being deleted (for weeks now). Ans especially a lot of old
> >> MDS are listed as "error" or "starting". The list doesn't match
> >> reality at all because I had to start them by hand.
> >>
> >> I tried "ceph mgr fail" and even a complete shutdown of the whole
> >> cluster with all nodes including all mgs, mds even osd - everything
> >> during a maintenance window. Didn't change anything.
> >>
> >> Could you help me? To be honest I'm still rather new to Ceph and since
> >> I didn't find anything in the logs that caught my eye I would be
> >> thankful for hints how to debug.
> >>
> >> Cheers,
> >> Thomas
> >> --
> >> http://www.widhalm.or.at
> >> GnuPG : 6265BAE6 , A84CB603
> >> Threema: H7AV7D33
> >> Telegram, Signal: widhalmt@xxxxxxxxxxxxx
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx