Can you share more details about the cluster, like 'ceph -s' and 'ceph
orch ls'. Have you tried a MGR failover just to see if that clears
anything? Also the active mgr log should contain at least some
information. How did you deploy the current services when
bootstrapping the cluster? Has anything changed regarding
security/firewall or anything like that?
Zitat von "Zach Heise (SSCC)" <heise@xxxxxxxxxxxx>:
Yes - running tail on /var/log/ceph/cephadm.log on ceph01, then
running 'ceph orch apply mgr "ceph01,ceph03"' (my active manager is
on ceph03 and I don't want to clobber it while troubleshooting)
the log output on ceph01's cephadm.log is merely the following
lines, over and over again, 6 times in a row, then a minute passes,
then another 6 copies of the following text, and repeat forever.
There is nothing listed in it about attempting the deployment of a
new daemon.
cephadm ['gather-facts']
2022-06-08 16:36:42,275 7f7c1ef9fb80 DEBUG /bin/podman: 3.2.3
2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinux
status: enabled
2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinuxfs
mount: /sys/fs/selinux
2022-06-08 16:36:42,520 7f7c1ef9fb80 DEBUG sestatus: SELinux root
directory: /etc/selinux
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Loaded policy
name: targeted
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Current
mode: enforcing
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Mode from
config file: enforcing
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Policy MLS
status: enabled
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Policy
deny_unknown status: allowed
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Memory
protection checking: actual (secure)
2022-06-08 16:36:42,521 7f7c1ef9fb80 DEBUG sestatus: Max kernel
policy version: 31
On 2022-06-08 4:30 PM, Eugen Block wrote:
Have you checked /var/log/ceph/cephadm.log on the target nodes?
Zitat von "Zach Heise (SSCC)" <heise@xxxxxxxxxxxx>:
Yes, sorry - I tried both 'ceph orch apply mgr "ceph01,ceph03"'
and 'ceph orch apply mds "ceph04,ceph05"' before writing this
initial email - once again, the same logged message: "6/8/22
2:25:12 PM[INF]Saving service mgr spec with placement
ceph03;ceph01" but there's no messages logged about attempting to
create the mgr daemon.
I tried this at the same time that I tried ''ceph orch apply mgr
--placement=2' that I mentioned in my original email.
I think what I need is some advice on how to check cephadm's
status - I assume it should be logging every time it tries to
deploy a new daemon right? That should be my next stop, I think -
looking at that log to see if it's even trying. I just don't know
how to get to that point.
And it's not just mgr daemons, it's any kind of daemon so far, is
not getting deployed.
But thank you for the advice, Dhairya.
-Zach
On 2022-06-08 3:44 PM, Dhairya Parmar wrote:
Hi Zach,
Try running `ceph orch apply mgr 2` or `ceph orch apply mgr
--placement="<host1> <host2>"`. Refer this
<https://docs.ceph.com/en/latest/cephadm/services/#orchestrator-cli-placement-spec> doc for more information, hope it
helps.
Regards,
Dhairya
On Thu, Jun 9, 2022 at 1:59 AM Zach Heise (SSCC)
<heise@xxxxxxxxxxxx> wrote:
Our 16.2.7 cluster was deployed using cephadm from the start, but
now it
seems like deploying daemons with it is broken. Running 'ceph orch
apply
mgr --placement=2' causes '6/8/22 2:34:18 PM[INF]Saving service
mgr spec
with placement count:2' to appear in the logs, but a 2nd mgr does not
get created.
I also confirmed the same with mds daemons - using the dashboard, I
tried creating a new set of MDS daemons "220606" count:3, but they
never
got deployed. The service type appears in the dashboard, though, just
with no daemons deployed under it. Then I tried to delete it with the
dashboard, and now 'ceph orch ls' outputs:
NAME PORTS RUNNING REFRESHED AGE
PLACEMENT
mds.220606 0/3 <deleting> 15h
count:3
More detail in YAML format doesn't even give me that much information:
ceph01> ceph orch ls --service_name=mds.220606 --format yaml
service_type: mds
service_id: '220606'
service_name: mds.220606
placement:
count: 3
status:
created: '2022-06-07T03:42:57.234124Z'
running: 0
size: 3
events:
- 2022-06-07T03:42:57.301349Z service:mds.220606 [INFO] "service was
created"
'ceph health detail' reports HEALTH_OK but cephadm doesn't seem to be
doing its job. I read through the Cephadm troubleshooting page on
ceph's
website but since the daemons I'm trying to create don't even seem to
try to spawn containers (podman ps shows the existing containers just
fine) I don't know where to look next for logs, to see if cephadm +
podman are trying to create new containers and failing, or not
even trying.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx