Re: ceph orch status hangs forever

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

HEALTH_WARN 2 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
    daemon mon.spsrc-mon-1-safe on spsrc-mon-1 is in error state
    daemon mon.spsrc-mon-2-safe on spsrc-mon-2 is in error state

I don't think these containers are crucial, right? I did ask a while ago:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/MQM46KBC3BNACYZWW37CGUHMNLTZQUTF/

I think you're right but I'm not sure. Your cluster seems to be healthy except for those "-safe" daemons, you could try to remove one of them after the orchestrator gets responsive again. Which containers are running on the MON nodes, can you share that (podman ps or docker ps)? The failing 'ceph crash' command could be due to missing or failed crash containers on the nodes, you should check that out. Which MGR modules are enabled? Maybe one of them is blocking the requests, there have been a couple of reports in this list, can you share the enabled MGR modules?

Here are the commands I tried to inspect the logs:
grep -i health -r /var/log/ceph/
grep -i error -r /var/log/ceph/

Did you search within a container, too?


Zitat von Sebastian Luna Valero <sebastian.luna.valero@xxxxxxxxx>:

Hi,

Here it is:
# cephadm shell -- ceph status
Using recent ceph image 172.16.3.146:4000/ceph/ceph:v15.2.9
  cluster:
    id:     3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c
    health: HEALTH_WARN
            2 failed cephadm daemon(s)

  services:
    mon: 3 daemons, quorum spsrc-mon-1,spsrc-mon-2,spsrc-mon-3 (age 7d)
    mgr: spsrc-mon-1.eziiam(active, since 7d), standbys:
spsrc-mon-2.ilbncj, spsrc-mon-3.vzwxfr
    mds: manila:1 {0=manila.spsrc-mon-2.syveaq=up:active} 2 up:standby
    osd: 248 osds: 248 up (since 2w), 248 in (since 3M)

  data:
    pools:   6 pools, 257 pgs
    objects: 4.77M objects, 5.9 TiB
    usage:   12 TiB used, 1.3 PiB / 1.3 PiB avail
    pgs:     257 active+clean

Also:
# cephadm shell -- ceph health detail
Using recent ceph image 172.16.3.146:4000/ceph/ceph:v15.2.9
HEALTH_WARN 2 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
    daemon mon.spsrc-mon-1-safe on spsrc-mon-1 is in error state
    daemon mon.spsrc-mon-2-safe on spsrc-mon-2 is in error state

I don't think these containers are crucial, right? I did ask a while ago:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/MQM46KBC3BNACYZWW37CGUHMNLTZQUTF/

All 3 Ceph monitor nodes report that "systemctl status ceph\*.service" are
ok.

Here are the commands I tried to inspect the logs:
grep -i health -r /var/log/ceph/
grep -i error -r /var/log/ceph/

I get:
ceph_volume.exceptions.ConfigurationError: Unable to load expected Ceph
config at: /etc/ceph/ceph.conf

But I think that's expected in a containerised deployment?

Do you suggest other commands?

Many thanks,
Sebastian


On Wed, 19 May 2021 at 21:49, Eugen Block <eblock@xxxxxx> wrote:

Hi,

can you paste the ceph status?
The orchestrator is a MGR module, have you checked if the containers
are up and running (assuming it’s cephadm based)? Do the logs also
report the cluster as healthy?

Zitat von Sebastian Luna Valero <sebastian.luna.valero@xxxxxxxxx>:

> Hi,
>
> After an unschedule power outage our Ceph (Octopus) cluster reports a
> healthy state with: "ceph status". However, when we run "ceph orch
status"
> the command hangs forever.
>
> Are there other commands that we can run for a more thorough health check
> of the cluster?
>
> After looking at:
> https://docs.ceph.com/en/octopus/rados/operations/health-checks/
>
> I also run "ceph crash ls-new" but it hangs forever as well.
>
> Any ideas?
>
> Our Ceph cluster is currently used as backend storage for our OpenStack
> cluster, and we are also having issues with storage volumes attached to
> VMs, but we don't know how to narrow down the root cause.
>
> Any feedback is highly appreciated.
>
> Best regards,
> Sebastian
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx




_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux