Hi all,
I'm on a evaluation stage and implementing a fully virtualized ceph
quincy test cluster.
I successfully deployed the first two mon and three osd; on the first
mon I also deployed the manager and the dashboard. All the deployments
was carried out without any automation (ansible or others), just
manually following docs.ceph.com .
The third monitor is driving me crazy!
I can successfully join mon3 to the cluster with:
# ceph-mon -i `hostname -s` --public-addr {monx-ip-address}
If I raise the ceph -s I get:
# ceph -s
cluster:
id: 787fbe49-8983-4f32-9817-5fc4d7370fd2
health: HEALTH_WARN
9 daemons have recently crashed
services:
mon: 3 daemons, quorum mon1,mon2,mon3 (age 14m)
mgr: mon1(active, since 15h)
osd: 9 osds: 9 up (since 15h), 9 in (since 6d)
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 577 KiB
usage: 123 MiB used, 288 GiB / 288 GiB avail
pgs: 1 active+clean
When I enable the automatic monitor start with:
# systemctl enable ceph-mon.target
# systemctl enable ceph-mon@`hostname -s`
The starts always fails.
Manually joining the cluster with the same command above, it works.
Let's talk about the failing daemons:
# ceph crash info
2022-12-08T09:13:51.228521Z_9f0ab405-757c-4f0a-9a17-e3d55f8e5986
{
"backtrace": [
"/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f9e3e973140]",
"gsignal()",
"abort()",
"/lib/x86_64-linux-gnu/libc.so.6(+0x2240f) [0x7f9e3e47c40f]",
"/lib/x86_64-linux-gnu/libc.so.6(+0x31662) [0x7f9e3e48b662]",
"(LogMonitor::log_external_backlog()+0xdfc) [0x55ea058ef77c]",
"(LogMonitor::update_from_paxos(bool*)+0x5c) [0x55ea058f21bc]",
"(Monitor::refresh_from_paxos(bool*)+0x163) [0x55ea0586b783]",
"(Monitor::preinit()+0x9af) [0x55ea05897a4f]",
"main()",
"__libc_start_main()",
"_start()"
],
"ceph_version": "17.2.5",
"crash_id":
"2022-12-08T09:13:51.228521Z_9f0ab405-757c-4f0a-9a17-e3d55f8e5986",
"entity_name": "mon.mon3",
"os_id": "11",
"os_name": "Debian GNU/Linux 11 (bullseye)",
"os_version": "11 (bullseye)",
"os_version_id": "11",
"process_name": "ceph-mon",
"stack_sig":
"72631d2013b9d940bdbdd12d61a624a39b276d13b89a663f155c77dcfcfc306a",
"timestamp": "2022-12-08T09:13:51.228521Z",
"utsname_hostname": "mon3",
"utsname_machine": "x86_64",
"utsname_release": "5.10.0-19-amd64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Debian 5.10.149-2 (2022-10-21)"
}
Pretty sure there is a piece of the mistery but I'm not able to get it.
All crash entries are equal.
Logs doesn't help: What I see there is that there are a lots of logs
about rocksdb but none seems pointing to an issue.
I know it's hard but a checklist to perform in these cases will help a lot.
Thanks for any help.
Francesco
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx