Re: Troubleshooting cephadm - not deploying any daemons

Redouane Kachach Elhichou <rkachach@xxxxxxxxxx> · Thu, 9 Jun 2022 09:27:01 +0200

To see what cephadm is doing you can check both the logs on:
*/var/log/ceph/cephadm.log* (here you can see what the cephadm running on
each host is doing) and you can also check what the cephadm (mgr module) is
doing by checking the logs of the mgr container by:

> podman logs -f `podman ps | grep mgr. | awk '{print $1}'`.

Normally this second command would show what cephadm is trying to do. To
see more debug from cephadm you can set the loglevel by using:

> cephadm shell
(and from the shell)
> ceph config set mgr mgr/cephadm/log_to_cluster_level info
> ceph log last 100 debug cephadm (to dump the last 100 messages)

You can activate the debug level as well but it will print a lot of
messages.

BTW: you can find this info on:
https://docs.ceph.com/en/quincy/cephadm/operations/

On Wed, Jun 8, 2022 at 11:47 PM Zach Heise (SSCC) <heise@xxxxxxxxxxxx>
wrote:

> Yes - running tail on /var/log/ceph/cephadm.log on ceph01, then running
> 'ceph orch apply mgr "ceph01,ceph03"' (my active manager is on ceph03
> and I don't want to clobber it while troubleshooting)
>
> the log output on ceph01's cephadm.log is merely the following lines,
> over and over again, 6 times in a row, then a minute passes, then
> another 6 copies of the following text, and repeat forever. There is
> nothing listed in it about attempting the deployment of a new daemon.
>
> cephadm ['gather-facts']
> 2022-06-08 16:36:42,275 7f7c1ef9fb80 DEBUG /bin/podman: 3.2.3
> 2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinux
> status:                 enabled
> 2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinuxfs
> mount:                /sys/fs/selinux
> 2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinux root
> directory:         /etc/selinux
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Loaded policy
> name:             targeted
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Current
> mode:                   enforcing
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Mode from config
> file:          enforcing
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Policy MLS
> status:              enabled
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Policy deny_unknown
> status:     allowed
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Memory protection
> checking:     actual (secure)
> 2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Max kernel policy
> version:      31
>
>
>
> On 2022-06-08 4:30 PM, Eugen Block wrote:
> > Have you checked /var/log/ceph/cephadm.log on the target nodes?
> >
> > Zitat von "Zach Heise (SSCC)" <heise@xxxxxxxxxxxx>:
> >
> >>  Yes, sorry - I tried both 'ceph orch apply mgr "ceph01,ceph03"' and
> >> 'ceph orch apply mds "ceph04,ceph05"' before writing this initial
> >> email - once again, the same logged message: "6/8/22 2:25:12
> >> PM[INF]Saving service mgr spec with placement ceph03;ceph01" but
> >> there's no messages logged about attempting to create the mgr daemon.
> >>
> >> I tried this at the same time that I tried ''ceph orch apply mgr
> >> --placement=2' that I mentioned in my original email.
> >>
> >> I think what I need is some advice on how to check cephadm's status -
> >> I assume it should be logging every time it tries to deploy a new
> >> daemon right? That should be my next stop, I think - looking at that
> >> log to see if it's even trying. I just don't know how to get to that
> >> point.
> >>
> >> And it's not just mgr daemons, it's any kind of daemon so far, is not
> >> getting deployed.
> >>
> >> But thank you for the advice, Dhairya.
> >> -Zach
> >>
> >> On 2022-06-08 3:44 PM, Dhairya Parmar wrote:
> >>> Hi Zach,
> >>>
> >>> Try running `ceph orch apply mgr 2` or `ceph orch apply mgr
> >>> --placement="<host1> <host2>"`. Refer this
> >>> <
> https://docs.ceph.com/en/latest/cephadm/services/#orchestrator-cli-placement-spec>
>
> >>> doc for more information, hope it helps.
> >>>
> >>> Regards,
> >>> Dhairya
> >>>
> >>> On Thu, Jun 9, 2022 at 1:59 AM Zach Heise (SSCC)
> >>> <heise@xxxxxxxxxxxx> wrote:
> >>>
> >>>    Our 16.2.7 cluster was deployed using cephadm from the start, but
> >>>    now it
> >>>    seems like deploying daemons with it is broken. Running 'ceph orch
> >>>    apply
> >>>    mgr --placement=2' causes '6/8/22 2:34:18 PM[INF]Saving service
> >>>    mgr spec
> >>>    with placement count:2' to appear in the logs, but a 2nd mgr does
> >>> not
> >>>    get created.
> >>>
> >>>    I also confirmed the same with mds daemons - using the dashboard, I
> >>>    tried creating a new set of MDS daemons "220606" count:3, but they
> >>>    never
> >>>    got deployed. The service type appears in the dashboard, though,
> >>> just
> >>>    with no daemons deployed under it. Then I tried to delete it with
> >>> the
> >>>    dashboard, and now 'ceph orch ls' outputs:
> >>>
> >>>    NAME                       PORTS        RUNNING  REFRESHED AGE
> >>>    PLACEMENT
> >>>    mds.220606                                  0/3 <deleting> 15h
> >>>    count:3
> >>>
> >>>    More detail in YAML format doesn't even give me that much
> >>> information:
> >>>
> >>>    ceph01> ceph orch ls --service_name=mds.220606 --format yaml
> >>>    service_type: mds
> >>>    service_id: '220606'
> >>>    service_name: mds.220606
> >>>    placement:
> >>>       count: 3
> >>>    status:
> >>>       created: '2022-06-07T03:42:57.234124Z'
> >>>       running: 0
> >>>       size: 3
> >>>    events:
> >>>    - 2022-06-07T03:42:57.301349Z service:mds.220606 [INFO] "service was
> >>>    created"
> >>>
> >>>    'ceph health detail' reports HEALTH_OK but cephadm doesn't seem
> >>> to be
> >>>    doing its job. I read through the Cephadm troubleshooting page on
> >>>    ceph's
> >>>    website but since the daemons I'm trying to create don't even
> >>> seem to
> >>>    try to spawn containers (podman ps shows the existing containers
> >>> just
> >>>    fine) I don't know where to look next for logs, to see if cephadm +
> >>>    podman are trying to create new containers and failing, or not
> >>>    even trying.
> >>>
> >>>    _______________________________________________
> >>>    ceph-users mailing list -- ceph-users@xxxxxxx
> >>>    To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx