Re: Cluster Health error's status

Michel Niyoyita <micou12@xxxxxxxxx> · Fri, 29 Oct 2021 10:34:20 +0200

Hello Etienne

This is the ceph -s output

root@ceph-mon1:~# ceph -s
  cluster:
    id:     43f5d6b4-74b0-4281-92ab-940829d3ee5e
    health: HEALTH_ERR
            1/3 mons down, quorum ceph-mon1,ceph-mon3
            14/47681 objects unfound (0.029%)
            1 scrub errors
            Possible data damage: 13 pgs recovery_unfound, 1 pg inconsistent
            Degraded data redundancy: 42/143043 objects degraded (0.029%),
13 pgs degraded
            2 slow ops, oldest one blocked for 2897 sec, daemons
[osd.0,osd.7] have slow ops.

  services:
    mon: 3 daemons, quorum ceph-mon1,ceph-mon3 (age 2h), out of quorum:
ceph-mon4
    mgr: ceph-mon1(active, since 25h), standbys: ceph-mon2
    osd: 12 osds: 12 up (since 97m), 12 in (since 25h); 10 remapped pgs

  data:
    pools:   5 pools, 225 pgs
    objects: 47.68k objects, 204 GiB
    usage:   603 GiB used, 4.1 TiB / 4.7 TiB avail
    pgs:     42/143043 objects degraded (0.029%)
             2460/143043 objects misplaced (1.720%)
             14/47681 objects unfound (0.029%)
             211 active+clean
             10  active+recovery_unfound+degraded+remapped
             3   active+recovery_unfound+degraded
             1   active+clean+inconsistent

  io:
    client:   2.0 KiB/s rd, 88 KiB/s wr, 2 op/s rd, 12 op/s wr

On Fri, Oct 29, 2021 at 10:09 AM Etienne Menguy <etienne.menguy@xxxxxxxx>
wrote:

> Hi,
>
> Please share “ceph -s” output.
>
> -
> Etienne Menguy
> etienne.menguy@xxxxxxxx
>
>
>
>
> On 29 Oct 2021, at 10:03, Michel Niyoyita <micou12@xxxxxxxxx> wrote:
>
> Hello team
>
> I am running a ceph cluster with 3 monitors and 4 OSDs nodes running 3osd
> each , I deployed my ceph cluster using ansible and ubuntu 20.04 as OS ,
> the ceph version is Octopus. yesterday , My server which hosts OSDs nodes
> restarted because of power issue and to comeback on its status one of the
> monitor is out of quorum and some Pg marks as damaged . please help me to
> solve this issue. below are health detail status I am finding. and the  4
> OSDs node are the same which are running monitors (3 of them).
>
> Best regards.
>
> Michel
>
>
> root@ceph-mon1:~# ceph health detail
> HEALTH_ERR 1/3 mons down, quorum ceph-mon1,ceph-mon3; 14/47195 objects
> unfound (0.030%); Possible data damage: 13 pgs recovery_unfound; Degraded
> data redundancy: 42/141585 objects degraded (0.030%), 13 pgs degraded; 2
> slow ops, oldest one blocked for 322 sec, daemons [osd.0,osd.7] have slow
> ops.
> [WRN] MON_DOWN: 1/3 mons down, quorum ceph-mon1,ceph-mon3
>    mon.ceph-mon4 (rank 2) addr [v2:
> 10.10.29.154:3300/0,v1:10.10.29.154:6789/0] is down (out of quorum)
> [WRN] OBJECT_UNFOUND: 14/47195 objects unfound (0.030%)
>    pg 5.77 has 1 unfound objects
>    pg 5.6d has 2 unfound objects
>    pg 5.6a has 1 unfound objects
>    pg 5.65 has 1 unfound objects
>    pg 5.4a has 1 unfound objects
>    pg 5.30 has 1 unfound objects
>    pg 5.28 has 1 unfound objects
>    pg 5.25 has 1 unfound objects
>    pg 5.19 has 1 unfound objects
>    pg 5.1a has 1 unfound objects
>    pg 5.1 has 1 unfound objects
>    pg 5.b has 1 unfound objects
>    pg 5.8 has 1 unfound objects
> [ERR] PG_DAMAGED: Possible data damage: 13 pgs recovery_unfound
>    pg 5.1 is active+recovery_unfound+degraded+remapped, acting [5,8,7], 1
> unfound
>    pg 5.8 is active+recovery_unfound+degraded+remapped, acting [6,11,8], 1
> unfound
>    pg 5.b is active+recovery_unfound+degraded+remapped, acting [7,0,5], 1
> unfound
>    pg 5.19 is active+recovery_unfound+degraded+remapped, acting [0,5,7], 1
> unfound
>    pg 5.1a is active+recovery_unfound+degraded, acting [10,11,8], 1 unfound
>    pg 5.25 is active+recovery_unfound+degraded+remapped, acting [0,10,11],
> 1 unfound
>    pg 5.28 is active+recovery_unfound+degraded+remapped, acting [6,11,8],
> 1 unfound
>    pg 5.30 is active+recovery_unfound+degraded+remapped, acting [7,5,0], 1
> unfound
>    pg 5.4a is active+recovery_unfound+degraded, acting [0,11,7], 1 unfound
>    pg 5.65 is active+recovery_unfound+degraded+remapped, acting [0,10,11],
> 1 unfound
>    pg 5.6a is active+recovery_unfound+degraded, acting [0,11,7], 1 unfound
>    pg 5.6d is active+recovery_unfound+degraded+remapped, acting [7,2,0], 2
> unfound
>    pg 5.77 is active+recovery_unfound+degraded+remapped, acting [5,6,8], 1
> unfound
> [WRN] PG_DEGRADED: Degraded data redundancy: 42/141585 objects degraded
> (0.030%), 13 pgs degraded
>    pg 5.1 is active+recovery_unfound+degraded+remapped, acting [5,8,7], 1
> unfound
>    pg 5.8 is active+recovery_unfound+degraded+remapped, acting [6,11,8], 1
> unfound
>    pg 5.b is active+recovery_unfound+degraded+remapped, acting [7,0,5], 1
> unfound
>    pg 5.19 is active+recovery_unfound+degraded+remapped, acting [0,5,7], 1
> unfound
>    pg 5.1a is active+recovery_unfound+degraded, acting [10,11,8], 1 unfound
>    pg 5.25 is active+recovery_unfound+degraded+remapped, acting [0,10,11],
> 1 unfound
>    pg 5.28 is active+recovery_unfound+degraded+remapped, acting [6,11,8],
> 1 unfound
>    pg 5.30 is active+recovery_unfound+degraded+remapped, acting [7,5,0], 1
> unfound
>    pg 5.4a is active+recovery_unfound+degraded, acting [0,11,7], 1 unfound
>    pg 5.65 is active+recovery_unfound+degraded+remapped, acting [0,10,11],
> 1 unfound
>    pg 5.6a is active+recovery_unfound+degraded, acting [0,11,7], 1 unfound
>    pg 5.6d is active+recovery_unfound+degraded+remapped, acting [7,2,0], 2
> unfound
>    pg 5.77 is active+recovery_unfound+degraded+remapped, acting [5,6,8], 1
> unfound
> [WRN] SLOW_OPS: 2 slow ops, oldest one blocked for 322 sec, daemons
> [osd.0,osd.7] have slow ops.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx