Unable to start monitor as a daemon

zRiemann Contact <contact@xxxxxxxxxxx> · Thu, 8 Dec 2022 10:53:34 +0100

Hi all,

I'm on a evaluation stage and implementing a fully virtualized ceph 
quincy test cluster.

I successfully deployed the first two mon and three osd; on the first 
mon I also deployed the manager and the dashboard. All the deployments 
was carried out without any automation (ansible or others), just 
manually following docs.ceph.com .

The third monitor is driving me crazy!

I can successfully join mon3 to the cluster with:

# ceph-mon -i `hostname -s` --public-addr {monx-ip-address}

If I raise the ceph -s I get:

# ceph -s
  cluster:
    id:     787fbe49-8983-4f32-9817-5fc4d7370fd2
    health: HEALTH_WARN
            9 daemons have recently crashed

  services:
    mon: 3 daemons, quorum mon1,mon2,mon3 (age 14m)
    mgr: mon1(active, since 15h)
    osd: 9 osds: 9 up (since 15h), 9 in (since 6d)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 577 KiB
    usage:   123 MiB used, 288 GiB / 288 GiB avail
    pgs:     1 active+clean

When I enable the automatic monitor start with:

# systemctl enable ceph-mon.target
# systemctl enable ceph-mon@`hostname -s`

The starts always fails.

Manually joining the cluster with the same command above, it works.

Let's talk about the failing daemons:

# ceph crash info 
2022-12-08T09:13:51.228521Z_9f0ab405-757c-4f0a-9a17-e3d55f8e5986
{
    "backtrace": [
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f9e3e973140]",
        "gsignal()",
        "abort()",
        "/lib/x86_64-linux-gnu/libc.so.6(+0x2240f) [0x7f9e3e47c40f]",
        "/lib/x86_64-linux-gnu/libc.so.6(+0x31662) [0x7f9e3e48b662]",
        "(LogMonitor::log_external_backlog()+0xdfc) [0x55ea058ef77c]",
        "(LogMonitor::update_from_paxos(bool*)+0x5c) [0x55ea058f21bc]",
        "(Monitor::refresh_from_paxos(bool*)+0x163) [0x55ea0586b783]",
        "(Monitor::preinit()+0x9af) [0x55ea05897a4f]",
        "main()",
        "__libc_start_main()",
        "_start()"
    ],
    "ceph_version": "17.2.5",
    "crash_id": 
"2022-12-08T09:13:51.228521Z_9f0ab405-757c-4f0a-9a17-e3d55f8e5986",
    "entity_name": "mon.mon3",
    "os_id": "11",
    "os_name": "Debian GNU/Linux 11 (bullseye)",
    "os_version": "11 (bullseye)",
    "os_version_id": "11",
    "process_name": "ceph-mon",
    "stack_sig": 
"72631d2013b9d940bdbdd12d61a624a39b276d13b89a663f155c77dcfcfc306a",
    "timestamp": "2022-12-08T09:13:51.228521Z",
    "utsname_hostname": "mon3",
    "utsname_machine": "x86_64",
    "utsname_release": "5.10.0-19-amd64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Debian 5.10.149-2 (2022-10-21)"
}

Pretty sure there is a piece of the mistery but I'm not able to get it. 
All crash entries are equal.

Logs doesn't help: What I see there is that there are a lots of logs 
about rocksdb but none seems pointing to an issue.

I know it's hard but a checklist to perform in these cases will help a lot.

Thanks for any help.

Francesco

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx