Did you check the permissions? To me it reads like the permission
denied errors prevent the MONs from starting and then as a result they
are removed from the monmap:
ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[786211]:
debug 2022-12-13T10:24:21.599+0000 7f317ba4d700 -1
mon.sparci-store1@1(probing) e5 handle_auth_bad_method hmm, they
didn't like 2 result (13) Permission denied
Dec 13 11:24:21 sparci-store1
ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[786211]:
debug 2022-12-13T10:24:21.600+0000 7f3177a45700 0
mon.sparci-store1@1(probing) e18 removed from monmap, suicide.
Zitat von Mevludin Blazevic <mblazevic@xxxxxxxxxxxxxx>:
The keyring is the same, but I found the following log lines:
Dec 13 12:22:18 sparci-store1
ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[813780]:
debug 2022-12-13T11:22:18.016+0000 7f789e7f3700 0
mon.sparci-store1@1(probing) e18 removed from monmap, suicide.
Dec 13 12:22:18 sparci-store1 bash[813882]: Error: no container with
name or ID
"ceph-8c774934-1535-11ec-973e-525400130e4f-mon.sparci-store1" found:
no such container
Dec 13 12:22:18 sparci-store1 bash[813911]: Error: no container with
name or ID
"ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1" found:
no such container
Dec 13 12:22:18 sparci-store1 bash[813939]: Error: no container with
name or ID
"ceph-8c774934-1535-11ec-973e-525400130e4f-mon.sparci-store1" found:
no such container
Doing cat on /var/lib/ceph/FSID/mon.sparci-store1/config results in
showing only 2 monitor nodes (the working ones). It seems like Ceph
removed the monitor from the config..
Am 13.12.2022 um 11:43 schrieb Eugen Block:
So you get "Permission denied" errors, I'm guessing either the mon
keyring is not present (or wrong) or the mon directory doesn't
belong to the ceph user. Can you check
ls -l /var/lib/ceph/FSID/mon.sparci-store1/
Compare the keyring file with the ones on the working mon nodes.
Zitat von Mevludin Blazevic <mblazevic@xxxxxxxxxxxxxx>:
Hi Eugen,
I assume the mon db is stored on the "OS disk". I could not find
any error related lines in cephadm.log, here is what journalctl
-xe tells me:
Dec 13 11:24:21 sparci-store1
ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[786211]: debug
2022-12-13T10:24:21.392+0000 7f318e1fa700 1
mon.sparci-store1@-1(???).paxosservice(auth 251..491) refresh
upgraded, format 0 -> 3
Dec 13 11:24:21 sparci-store1
ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[786211]: debug
2022-12-13T10:24:21.397+0000 7f3179248700 1 heartbeat_map
reset_timeout 'Monitor::cpu_tp thread 0x7f3179248700' had timed
out after 0.000000000s
Dec 13 11:24:21 sparci-store1
ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[786211]: debug
2022-12-13T10:24:21.397+0000 7f318e1fa700 0
mon.sparci-store1@-1(probing) e5 my rank is now 1 (was -1)
Dec 13 11:24:21 sparci-store1
ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[786211]: debug
2022-12-13T10:24:21.398+0000 7f317ba4d700 -1
mon.sparci-store1@1(probing) e5 handle_auth_bad_method hmm, they
didn't like 2 result (13) Permission denied
Dec 13 11:24:21 sparci-store1 systemd[1]: Started Ceph
mon.sparci-store1 for 8c774934-1535-11ec-973e-525400130e4f.
-- Subject: Unit
ceph-8c774934-1535-11ec-973e-525400130e4f@mon.sparci-store1.service has
finished start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- Unit
ceph-8c774934-1535-11ec-973e-525400130e4f@mon.sparci-store1.service has
finished starting up.
--
-- The start-up result is done.
Dec 13 11:24:21 sparci-store1
ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[786211]: debug
2022-12-13T10:24:21.599+0000 7f317ba4d700 -1
mon.sparci-store1@1(probing) e5 handle_auth_bad_method hmm, they
didn't like 2 result (13) Permission denied
Dec 13 11:24:21 sparci-store1
ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1[786211]: debug
2022-12-13T10:24:21.600+0000 7f3177a45700 0
mon.sparci-store1@1(probing) e18 removed from monmap, suicide.
Dec 13 11:24:21 sparci-store1 systemd[1]:
var-lib-containers-storage-overlay-2e67bce8ea3795683c4326479c7169a713e9a7630b31f25d60cd45bbd9fa56bd-merged.mount:
Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- The unit
var-lib-containers-storage-overlay-2e67bce8ea3795683c4326479c7169a713e9a7630b31f25d60cd45bbd9fa56bd-merged.mount has successfully entered the 'dead'
state.
Dec 13 11:24:21 sparci-store1 bash[786318]: Error: no container
with name or ID
"ceph-8c774934-1535-11ec-973e-525400130e4f-mon.sparci-store1"
found: no such container
Dec 13 11:24:21 sparci-store1 bash[786346]: Error: no container
with name or ID
"ceph-8c774934-1535-11ec-973e-525400130e4f-mon-sparci-store1"
found: no such container
Dec 13 11:24:21 sparci-store1 bash[786375]: Error: no container
with name or ID
"ceph-8c774934-1535-11ec-973e-525400130e4f-mon.sparci-store1"
found: no such container
Dec 13 11:24:21 sparci-store1 systemd[1]:
ceph-8c774934-1535-11ec-973e-525400130e4f@mon.sparci-store1.service:
Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- The unit
ceph-8c774934-1535-11ec-973e-525400130e4f@mon.sparci-store1.service has
successfully entered the 'dead' state.
Regards,
Mevludin
Am 08.12.2022 um 09:30 schrieb Eugen Block:
Hi,
do the MONs use the same SAS interface? They store the mon db on
local disk, so it might be related. But without any logs or more
details it's just guessing.
Regards,
Eugen
Zitat von Mevludin Blazevic <mblazevic@xxxxxxxxxxxxxx>:
Hi all,
I'm running Pacific with cephadm.
After installation, ceph automatically provisoned 5 ceph monitor
nodes across the cluster. After a few OSDs crashed due to a
hardware related issue with the SAS interface, 3 monitor
services are stopped and won't restart again. Is it related to
the OSD crash problem?
Thanks,
Mevludin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Mevludin Blazevic, M.Sc.
University of Koblenz-Landau
Computing Centre (GHRKO)
Universitaetsstrasse 1
D-56070 Koblenz, Germany
Room A023
Tel: +49 261/287-1326
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Mevludin Blazevic, M.Sc.
University of Koblenz-Landau
Computing Centre (GHRKO)
Universitaetsstrasse 1
D-56070 Koblenz, Germany
Room A023
Tel: +49 261/287-1326
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx