Unable to start monitor as a daemon

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I'm on a evaluation stage and implementing a fully virtualized ceph quincy test cluster.

I successfully deployed the first two mon and three osd; on the first mon I also deployed the manager and the dashboard. All the deployments was carried out without any automation (ansible or others), just manually following docs.ceph.com .

The third monitor is driving me crazy!

I can successfully join mon3 to the cluster with:


# ceph-mon -i `hostname -s` --public-addr {monx-ip-address}


If I raise the ceph -s I get:


# ceph -s
  cluster:
    id:     787fbe49-8983-4f32-9817-5fc4d7370fd2
    health: HEALTH_WARN
            9 daemons have recently crashed

  services:
    mon: 3 daemons, quorum mon1,mon2,mon3 (age 14m)
    mgr: mon1(active, since 15h)
    osd: 9 osds: 9 up (since 15h), 9 in (since 6d)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 577 KiB
    usage:   123 MiB used, 288 GiB / 288 GiB avail
    pgs:     1 active+clean


When I enable the automatic monitor start with:


# systemctl enable ceph-mon.target
# systemctl enable ceph-mon@`hostname -s`


The starts always fails.

Manually joining the cluster with the same command above, it works.


Let's talk about the failing daemons:


# ceph crash info 2022-12-08T09:13:51.228521Z_9f0ab405-757c-4f0a-9a17-e3d55f8e5986
{
    "backtrace": [
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f9e3e973140]",
        "gsignal()",
        "abort()",
        "/lib/x86_64-linux-gnu/libc.so.6(+0x2240f) [0x7f9e3e47c40f]",
        "/lib/x86_64-linux-gnu/libc.so.6(+0x31662) [0x7f9e3e48b662]",
        "(LogMonitor::log_external_backlog()+0xdfc) [0x55ea058ef77c]",
        "(LogMonitor::update_from_paxos(bool*)+0x5c) [0x55ea058f21bc]",
        "(Monitor::refresh_from_paxos(bool*)+0x163) [0x55ea0586b783]",
        "(Monitor::preinit()+0x9af) [0x55ea05897a4f]",
        "main()",
        "__libc_start_main()",
        "_start()"
    ],
    "ceph_version": "17.2.5",
    "crash_id": "2022-12-08T09:13:51.228521Z_9f0ab405-757c-4f0a-9a17-e3d55f8e5986",
    "entity_name": "mon.mon3",
    "os_id": "11",
    "os_name": "Debian GNU/Linux 11 (bullseye)",
    "os_version": "11 (bullseye)",
    "os_version_id": "11",
    "process_name": "ceph-mon",
    "stack_sig": "72631d2013b9d940bdbdd12d61a624a39b276d13b89a663f155c77dcfcfc306a",
    "timestamp": "2022-12-08T09:13:51.228521Z",
    "utsname_hostname": "mon3",
    "utsname_machine": "x86_64",
    "utsname_release": "5.10.0-19-amd64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Debian 5.10.149-2 (2022-10-21)"
}


Pretty sure there is a piece of the mistery but I'm not able to get it. All crash entries are equal.


Logs doesn't help: What I see there is that there are a lots of logs about rocksdb but none seems pointing to an issue.


I know it's hard but a checklist to perform in these cases will help a lot.


Thanks for any help.

Francesco

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux