Re: Ceph Down on Cluster

Goncalo Borges <goncalo.borges@xxxxxxxxxxxxx> · Sat, 19 Nov 2016 01:22:02 +0000

Olá Bruno

I am not understanding your outputs.

On the first 'ceph -s' it says one mon is down but hour 'ceph health detail' does not report it further.

On your crush map I count 7 osds= 0,1,2,3,4,6,7 but ceph -s says only 6 are active.

Can you send the output of 'ceph osd tree, 'ceph osd df' and 'ceph osd dump'?

Abraco
Goncalo

________________________________________
From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Bruno Silva [bemanuel.pe@xxxxxxxxx]
Sent: 19 November 2016 11:48
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Ceph Down on Cluster

Hi, thanks.

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 device5
device 6 osd.6
device 7 osd.7

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host pxm00node01 {
id -2 # do not change unnecessarily
# weight 0.540
alg straw
hash 0 # rjenkins1
item osd.0 weight 0.540
}
host pmx00node03 {
id -3 # do not change unnecessarily
# weight 0.540
alg straw
hash 0 # rjenkins1
item osd.1 weight 0.540
}
host pxmnode04 {
id -4 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
}
host pmx00node04 {
id -5 # do not change unnecessarily
# weight 0.530
alg straw
hash 0 # rjenkins1
item osd.2 weight 0.530
}
host pmx00node01 {
id -6 # do not change unnecessarily
# weight 1.080
alg straw
hash 0 # rjenkins1
item osd.6 weight 0.540
item osd.7 weight 0.540
}
host pmx00node02 {
id -7 # do not change unnecessarily
# weight 0.530
alg straw
hash 0 # rjenkins1
item osd.3 weight 0.530
}
host pmx00node05 {
id -8 # do not change unnecessarily
# weight 0.530
alg straw
hash 0 # rjenkins1
item osd.4 weight 0.530
}
root default {
id -1 # do not change unnecessarily
# weight 3.750
alg straw
hash 0 # rjenkins1
item pxm00node01 weight 0.540
item pmx00node03 weight 0.540
item pxmnode04 weight 0.000
item pmx00node04 weight 0.530
item pmx00node01 weight 1.080
item pmx00node02 weight 0.530
item pmx00node05 weight 0.530
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map
Em sex, 18 de nov de 2016 20:48, Brian :: <bc@xxxxxxxx<mailto:bc@xxxxxxxx>> escreveu:
Hi Bruno

Do you only have 6 OSDs across the 5 nodes?

you may have an issue with read or write errors on 1 osd and because
there aren't many more go to osds this is going to cause the cluster
pain. Post your crush map and the experts here maybe able to advise
but with a cluster of this size you may have issues getting it back to
a healthy state if 1 osd is causing problems...

..

On Fri, Nov 18, 2016 at 10:51 PM, Bruno Silva <bemanuel.pe@xxxxxxxxx<mailto:bemanuel.pe@xxxxxxxxx>> wrote:
> I have a Cluster with 5 nodes Ceph. For some reason the sync down and now I
> don't know what i can do to restore it.
> # ceph -s
>     cluster 338bc0a5-c2f7-4c0a-9b35-25c7afee50c6
>      health HEALTH_WARN
>             1 pgs down
>             6 pgs incomplete
>             6 pgs stuck inactive
>             6 pgs stuck unclean
>             3 requests are blocked > 32 sec
>             1 mons down, quorum 0,1,2,3 0,2,1,3
>      monmap e5: 5 mons at
> {0=xyxyxyxyx:6789/0,1=xyxyxyxyx:6789/0,2=xyxyxyxyx:6789/0,3=1xyxyxyxyx:6789/0,4=xyxyxyxyx:6789/0}
>             election epoch 63162, quorum 0,1,2,3 0,2,1,3
>      osdmap e2575: 6 osds: 6 up, 6 in
>       pgmap v6105104: 128 pgs, 1 pools, 748 GB data, 188 kobjects
>             2217 GB used, 1072 GB / 3290 GB avail
>                  122 active+clean
>                    5 incomplete
>                    1 down+incomplete
>   client io 106 B/s wr, 0 op/s
>
>
> ceph -w
>     cluster 338bc0a5-c2f7-4c0a-9b35-25c7afee50c6
>      health HEALTH_WARN
>             1 pgs down
>             6 pgs incomplete
>             6 pgs stuck inactive
>             6 pgs stuck unclean
>             3 requests are blocked > 32 sec
>      monmap e5: 5 mons at
> {0=xyxyxyxyx:6789/0,1=xyxyxyxyx:6789/0,2=xyxyxyxyx:6789/0,3=xyxyxyxyx:6789/0,4=xyxyxyxyx:6789/0}
>             election epoch 63164, quorum 0,1,2,3,4 0,2,1,3,4
>      osdmap e2575: 6 osds: 6 up, 6 in
>       pgmap v6105130: 128 pgs, 1 pools, 748 GB data, 188 kobjects
>             2217 GB used, 1072 GB / 3290 GB avail
>                  122 active+clean
>                    5 incomplete
>                    1 down+incomplete
>   client io 1262 B/s wr, 0 op/s
>
> 2016-11-18 19:49:58.005806 mon.0 [INF] pgmap v6105130: 128 pgs: 1
> down+incomplete, 122 active+clean, 5 incomplete; 748 GB data, 2217 GB used,
> 1072 GB / 3290 GB avail; 1262 B/s wr, 0 op/s
> 2016-11-18 19:50:02.731566 mon.0 [INF] pgmap v6105131: 128 pgs: 1
> down+incomplete, 122 active+clean, 5 incomplete; 748 GB data, 2217 GB used,
> 1072 GB / 3290 GB avail; 1228 B/s wr, 0 op/s
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com