Re: ceph orch status hangs forever

Sebastian Luna Valero <sebastian.luna.valero@xxxxxxxxx> · Thu, 20 May 2021 07:58:20 +0200

Hi,

Here it is:
# cephadm shell -- ceph status
Using recent ceph image 172.16.3.146:4000/ceph/ceph:v15.2.9
  cluster:
    id:     3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c
    health: HEALTH_WARN
            2 failed cephadm daemon(s)

  services:
    mon: 3 daemons, quorum spsrc-mon-1,spsrc-mon-2,spsrc-mon-3 (age 7d)
    mgr: spsrc-mon-1.eziiam(active, since 7d), standbys:
spsrc-mon-2.ilbncj, spsrc-mon-3.vzwxfr
    mds: manila:1 {0=manila.spsrc-mon-2.syveaq=up:active} 2 up:standby
    osd: 248 osds: 248 up (since 2w), 248 in (since 3M)

  data:
    pools:   6 pools, 257 pgs
    objects: 4.77M objects, 5.9 TiB
    usage:   12 TiB used, 1.3 PiB / 1.3 PiB avail
    pgs:     257 active+clean

Also:
# cephadm shell -- ceph health detail
Using recent ceph image 172.16.3.146:4000/ceph/ceph:v15.2.9
HEALTH_WARN 2 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
    daemon mon.spsrc-mon-1-safe on spsrc-mon-1 is in error state
    daemon mon.spsrc-mon-2-safe on spsrc-mon-2 is in error state

I don't think these containers are crucial, right? I did ask a while ago:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/MQM46KBC3BNACYZWW37CGUHMNLTZQUTF/

All 3 Ceph monitor nodes report that "systemctl status ceph\*.service" are
ok.

Here are the commands I tried to inspect the logs:
grep -i health -r /var/log/ceph/
grep -i error -r /var/log/ceph/

I get:
ceph_volume.exceptions.ConfigurationError: Unable to load expected Ceph
config at: /etc/ceph/ceph.conf

But I think that's expected in a containerised deployment?

Do you suggest other commands?

Many thanks,
Sebastian

On Wed, 19 May 2021 at 21:49, Eugen Block <eblock@xxxxxx> wrote:

> Hi,
>
> can you paste the ceph status?
> The orchestrator is a MGR module, have you checked if the containers
> are up and running (assuming it’s cephadm based)? Do the logs also
> report the cluster as healthy?
>
> Zitat von Sebastian Luna Valero <sebastian.luna.valero@xxxxxxxxx>:
>
> > Hi,
> >
> > After an unschedule power outage our Ceph (Octopus) cluster reports a
> > healthy state with: "ceph status". However, when we run "ceph orch
> status"
> > the command hangs forever.
> >
> > Are there other commands that we can run for a more thorough health check
> > of the cluster?
> >
> > After looking at:
> > https://docs.ceph.com/en/octopus/rados/operations/health-checks/
> >
> > I also run "ceph crash ls-new" but it hangs forever as well.
> >
> > Any ideas?
> >
> > Our Ceph cluster is currently used as backend storage for our OpenStack
> > cluster, and we are also having issues with storage volumes attached to
> > VMs, but we don't know how to narrow down the root cause.
> >
> > Any feedback is highly appreciated.
> >
> > Best regards,
> > Sebastian
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx