Re: Cluster Health error's status

Eugen Block <eblock@xxxxxx> · Fri, 29 Oct 2021 09:02:38 +0000

Also what does the crush rule look like for pool 5 and what is the  
failure-domain?

Zitat von Etienne Menguy <etienne.menguy@xxxxxxxx>:

With “ceph pg x.y query” you can check why it’s complaining.

x.y for pg id, like 5.77

It would also be interesting to check why mon fails to rejoin  
quorum, it may give you hints at your OSD issues.

-
Etienne Menguy
etienne.menguy@xxxxxxxx

On 29 Oct 2021, at 10:34, Michel Niyoyita <micou12@xxxxxxxxx> wrote:

Hello Etienne

This is the ceph -s output

root@ceph-mon1:~# ceph -s
  cluster:
    id:     43f5d6b4-74b0-4281-92ab-940829d3ee5e
    health: HEALTH_ERR
            1/3 mons down, quorum ceph-mon1,ceph-mon3
            14/47681 objects unfound (0.029%)
            1 scrub errors
            Possible data damage: 13 pgs recovery_unfound, 1 pg inconsistent
            Degraded data redundancy: 42/143043 objects degraded  
(0.029%), 13 pgs degraded
            2 slow ops, oldest one blocked for 2897 sec, daemons  
[osd.0,osd.7] have slow ops.

  services:
    mon: 3 daemons, quorum ceph-mon1,ceph-mon3 (age 2h), out of  
quorum: ceph-mon4
    mgr: ceph-mon1(active, since 25h), standbys: ceph-mon2
    osd: 12 osds: 12 up (since 97m), 12 in (since 25h); 10 remapped pgs

  data:
    pools:   5 pools, 225 pgs
    objects: 47.68k objects, 204 GiB
    usage:   603 GiB used, 4.1 TiB / 4.7 TiB avail
    pgs:     42/143043 objects degraded (0.029%)
             2460/143043 objects misplaced (1.720%)
             14/47681 objects unfound (0.029%)
             211 active+clean
             10  active+recovery_unfound+degraded+remapped
             3   active+recovery_unfound+degraded
             1   active+clean+inconsistent

  io:
    client:   2.0 KiB/s rd, 88 KiB/s wr, 2 op/s rd, 12 op/s wr

On Fri, Oct 29, 2021 at 10:09 AM Etienne Menguy  
<etienne.menguy@xxxxxxxx <mailto:etienne.menguy@xxxxxxxx>> wrote:
Hi,

Please share “ceph -s” output.

-
Etienne Menguy
etienne.menguy@xxxxxxxx <mailto:etienne.menguy@xxxxxxxx>

On 29 Oct 2021, at 10:03, Michel Niyoyita <micou12@xxxxxxxxx  
<mailto:micou12@xxxxxxxxx>> wrote:

Hello team

I am running a ceph cluster with 3 monitors and 4 OSDs nodes running 3osd
each , I deployed my ceph cluster using ansible and ubuntu 20.04 as OS ,
the ceph version is Octopus. yesterday , My server which hosts OSDs nodes
restarted because of power issue and to comeback on its status one of the
monitor is out of quorum and some Pg marks as damaged . please help me to
solve this issue. below are health detail status I am finding. and the  4
OSDs node are the same which are running monitors (3 of them).

Best regards.

Michel

root@ceph-mon1:~# ceph health detail
HEALTH_ERR 1/3 mons down, quorum ceph-mon1,ceph-mon3; 14/47195 objects
unfound (0.030%); Possible data damage: 13 pgs recovery_unfound; Degraded
data redundancy: 42/141585 objects degraded (0.030%), 13 pgs degraded; 2
slow ops, oldest one blocked for 322 sec, daemons [osd.0,osd.7] have slow
ops.
[WRN] MON_DOWN: 1/3 mons down, quorum ceph-mon1,ceph-mon3
   mon.ceph-mon4 (rank 2) addr [v2:
10.10.29.154:3300/0,v1:10.10.29.154:6789/0  
<http://10.10.29.154:3300/0,v1:10.10.29.154:6789/0>] is down (out  
of quorum)
[WRN] OBJECT_UNFOUND: 14/47195 objects unfound (0.030%)
   pg 5.77 has 1 unfound objects
   pg 5.6d has 2 unfound objects
   pg 5.6a has 1 unfound objects
   pg 5.65 has 1 unfound objects
   pg 5.4a has 1 unfound objects
   pg 5.30 has 1 unfound objects
   pg 5.28 has 1 unfound objects
   pg 5.25 has 1 unfound objects
   pg 5.19 has 1 unfound objects
   pg 5.1a has 1 unfound objects
   pg 5.1 has 1 unfound objects
   pg 5.b has 1 unfound objects
   pg 5.8 has 1 unfound objects
[ERR] PG_DAMAGED: Possible data damage: 13 pgs recovery_unfound
   pg 5.1 is active+recovery_unfound+degraded+remapped, acting [5,8,7], 1
unfound
   pg 5.8 is active+recovery_unfound+degraded+remapped, acting [6,11,8], 1
unfound
   pg 5.b is active+recovery_unfound+degraded+remapped, acting [7,0,5], 1
unfound
   pg 5.19 is active+recovery_unfound+degraded+remapped, acting [0,5,7], 1
unfound
   pg 5.1a is active+recovery_unfound+degraded, acting [10,11,8], 1 unfound
   pg 5.25 is active+recovery_unfound+degraded+remapped, acting [0,10,11],
1 unfound
   pg 5.28 is active+recovery_unfound+degraded+remapped, acting [6,11,8],
1 unfound
   pg 5.30 is active+recovery_unfound+degraded+remapped, acting [7,5,0], 1
unfound
   pg 5.4a is active+recovery_unfound+degraded, acting [0,11,7], 1 unfound
   pg 5.65 is active+recovery_unfound+degraded+remapped, acting [0,10,11],
1 unfound
   pg 5.6a is active+recovery_unfound+degraded, acting [0,11,7], 1 unfound
   pg 5.6d is active+recovery_unfound+degraded+remapped, acting [7,2,0], 2
unfound
   pg 5.77 is active+recovery_unfound+degraded+remapped, acting [5,6,8], 1
unfound
[WRN] PG_DEGRADED: Degraded data redundancy: 42/141585 objects degraded
(0.030%), 13 pgs degraded
   pg 5.1 is active+recovery_unfound+degraded+remapped, acting [5,8,7], 1
unfound
   pg 5.8 is active+recovery_unfound+degraded+remapped, acting [6,11,8], 1
unfound
   pg 5.b is active+recovery_unfound+degraded+remapped, acting [7,0,5], 1
unfound
   pg 5.19 is active+recovery_unfound+degraded+remapped, acting [0,5,7], 1
unfound
   pg 5.1a is active+recovery_unfound+degraded, acting [10,11,8], 1 unfound
   pg 5.25 is active+recovery_unfound+degraded+remapped, acting [0,10,11],
1 unfound
   pg 5.28 is active+recovery_unfound+degraded+remapped, acting [6,11,8],
1 unfound
   pg 5.30 is active+recovery_unfound+degraded+remapped, acting [7,5,0], 1
unfound
   pg 5.4a is active+recovery_unfound+degraded, acting [0,11,7], 1 unfound
   pg 5.65 is active+recovery_unfound+degraded+remapped, acting [0,10,11],
1 unfound
   pg 5.6a is active+recovery_unfound+degraded, acting [0,11,7], 1 unfound
   pg 5.6d is active+recovery_unfound+degraded+remapped, acting [7,2,0], 2
unfound
   pg 5.77 is active+recovery_unfound+degraded+remapped, acting [5,6,8], 1
unfound
[WRN] SLOW_OPS: 2 slow ops, oldest one blocked for 322 sec, daemons
[osd.0,osd.7] have slow ops.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx  
<mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx