Re: ceph orch command hung

Eugen Block <eblock@xxxxxx> · Mon, 11 Sep 2023 13:22:12 +0000

Hi,

you should unpause your cluster (ceph osd unpause) so all services can  
read again. The other flags are probably also safe to unset since all  
OSDs seem to be up.

Regards,
Eugen

Zitat von Taku Izumi <kgh02017.g@xxxxxxxxx>:

Hi all,

I have 4-nodes ceph cluster.

After I shutted down my cluster, I tried to start it again, but failed  due
to

ceph orch xxx (such as status) commands hung.

How sould I recover from this problem ?

root@ceph-manager:/# ceph orch status   ==> hung

^CInterrupted

root@ceph-manager:/# ceph status

  cluster:

    id:     4588ed80-352b-11ee-9eae-157ca4325420

    health: HEALTH_ERR

            2 failed cephadm daemon(s)

            1 filesystem is degraded

            1 filesystem is offline

            pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover
flag(s) set

            10 slow ops, oldest one blocked for 3736 sec, mon.ceph-osd0 has
slow ops

  services:

    mon: 4 daemons, quorum ceph-manager,ceph-osd0,ceph-osd1,ceph-osd2 (age
64m)

    mgr: ceph-manager.kurjlh(active, since 64m), standbys: ceph-osd0.jodevs

    mds: 0/1 daemons up (1 failed), 2 standby

    osd: 3 osds: 3 up (since 64m), 3 in (since 2w)

         flags pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover

  data:

    volumes: 0/1 healthy, 1 failed

    pools:   11 pools, 243 pgs

    objects: 3.01k objects, 9.4 GiB

    usage:   28 GiB used, 2.8 TiB / 2.8 TiB avail

    pgs:     243 active+clean

root@ceph-manager:/# ceph health detail

HEALTH_ERR 2 failed cephadm daemon(s); 1 filesystem is degraded; 1
filesystem is offline;
pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set;
10 slow ops, oldest one blocked for 3741 sec, mon.ceph-osd0 has slow ops

[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)

    daemon rgw.sno_rgw.ceph-manager.umzmku on ceph-manager is in error state

    daemon rgw.sno_rgw.ceph-osd2.vfpmbs on ceph-osd2 is in error state

[WRN] FS_DEGRADED: 1 filesystem is degraded

    fs sno_cephfs is degraded

[ERR] MDS_ALL_DOWN: 1 filesystem is offline

    fs sno_cephfs is offline because no MDS is active for it.

[WRN] OSDMAP_FLAGS:
pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set

[WRN] SLOW_OPS: 10 slow ops, oldest one blocked for 3741 sec, mon.ceph-osd0
has slow ops
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx