Re: Troubleshooting cephadm - not deploying any daemons

"Zach Heise (SSCC)" <heise@xxxxxxxxxxxx> · Wed, 8 Jun 2022 16:46:43 -0500

Yes - running tail on /var/log/ceph/cephadm.log on ceph01, then running 
'ceph orch apply mgr "ceph01,ceph03"' (my active manager is on ceph03 
and I don't want to clobber it while troubleshooting)

the log output on ceph01's cephadm.log is merely the following lines, 
over and over again, 6 times in a row, then a minute passes, then 
another 6 copies of the following text, and repeat forever. There is 
nothing listed in it about attempting the deployment of a new daemon.

cephadm ['gather-facts']
2022-06-08 16:36:42,275 7f7c1ef9fb80 DEBUG /bin/podman: 3.2.3
2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinux 
status:                 enabled
2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinuxfs 
mount:                /sys/fs/selinux
2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinux root 
directory:         /etc/selinux
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Loaded policy 
name:             targeted
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Current 
mode:                   enforcing
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Mode from config 
file:          enforcing
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Policy MLS 
status:              enabled
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Policy deny_unknown 
status:     allowed
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Memory protection 
checking:     actual (secure)
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Max kernel policy 
version:      31

On 2022-06-08 4:30 PM, Eugen Block wrote:
Have you checked /var/log/ceph/cephadm.log on the target nodes?

Zitat von "Zach Heise (SSCC)" <heise@xxxxxxxxxxxx>:

 Yes, sorry - I tried both 'ceph orch apply mgr "ceph01,ceph03"' and 
'ceph orch apply mds "ceph04,ceph05"' before writing this initial 
email - once again, the same logged message: "6/8/22 2:25:12 
PM[INF]Saving service mgr spec with placement ceph03;ceph01" but 
there's no messages logged about attempting to create the mgr daemon.

I tried this at the same time that I tried ''ceph orch apply mgr 
--placement=2' that I mentioned in my original email.

I think what I need is some advice on how to check cephadm's status - 
I assume it should be logging every time it tries to deploy a new 
daemon right? That should be my next stop, I think - looking at that 
log to see if it's even trying. I just don't know how to get to that 
point.

And it's not just mgr daemons, it's any kind of daemon so far, is not 
getting deployed.

But thank you for the advice, Dhairya.
-Zach

On 2022-06-08 3:44 PM, Dhairya Parmar wrote:
Hi Zach,

Try running `ceph orch apply mgr 2` or `ceph orch apply mgr 
--placement="<host1> <host2>"`. Refer this 
<https://docs.ceph.com/en/latest/cephadm/services/#orchestrator-cli-placement-spec> 
doc for more information, hope it helps.

Regards,
Dhairya

On Thu, Jun 9, 2022 at 1:59 AM Zach Heise (SSCC) 
<heise@xxxxxxxxxxxx> wrote:

   Our 16.2.7 cluster was deployed using cephadm from the start, but
   now it
   seems like deploying daemons with it is broken. Running 'ceph orch
   apply
   mgr --placement=2' causes '6/8/22 2:34:18 PM[INF]Saving service
   mgr spec
   with placement count:2' to appear in the logs, but a 2nd mgr does 
not
   get created.

   I also confirmed the same with mds daemons - using the dashboard, I
   tried creating a new set of MDS daemons "220606" count:3, but they
   never
   got deployed. The service type appears in the dashboard, though, 
just
   with no daemons deployed under it. Then I tried to delete it with 
the
   dashboard, and now 'ceph orch ls' outputs:

   NAME                       PORTS        RUNNING  REFRESHED AGE
   PLACEMENT
   mds.220606                                  0/3 <deleting> 15h
   count:3

   More detail in YAML format doesn't even give me that much 
information:

   ceph01> ceph orch ls --service_name=mds.220606 --format yaml
   service_type: mds
   service_id: '220606'
   service_name: mds.220606
   placement:
      count: 3
   status:
      created: '2022-06-07T03:42:57.234124Z'
      running: 0
      size: 3
   events:
   - 2022-06-07T03:42:57.301349Z service:mds.220606 [INFO] "service was
   created"

   'ceph health detail' reports HEALTH_OK but cephadm doesn't seem 
to be
   doing its job. I read through the Cephadm troubleshooting page on
   ceph's
   website but since the daemons I'm trying to create don't even 
seem to
   try to spawn containers (podman ps shows the existing containers 
just
   fine) I don't know where to look next for logs, to see if cephadm +
   podman are trying to create new containers and failing, or not
   even trying.

   _______________________________________________
   ceph-users mailing list -- ceph-users@xxxxxxx
   To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx