Re: cluster ceph -s error

Ishmael Tsoaela <ishmaelt3@xxxxxxxxx> · Mon, 20 Jun 2016 08:55:53 +0200

Hi David,
Apologies for the late response.

NodeB is mon+client, nodeC is client:

Cheph health details:

HEALTH_ERR 819 pgs are stuck inactive for more than 300 seconds; 883 pgs degraded; 64 pgs stale; 819 pgs stuck inactive; 1064 pgs stuck unclean; 883 pgs undersized; 22 requests are blocked > 32 sec; 3 osds have slow requests; recovery 2/8 objects degraded (25.000%); recovery 2/8 objects misplaced (25.000%); crush map has legacy tunables (require argonaut, min is firefly); crush map has straw_calc_version=0
pg 2.fc is stuck inactive since forever, current state undersized+degraded+peered, last acting [2]
pg 2.fd is stuck inactive since forever, current state undersized+degraded+peered, last acting [0]
pg 2.fe is stuck inactive since forever, current state undersized+degraded+peered, last acting [2]
pg 2.ff is stuck inactive since forever, current state undersized+degraded+peered, last acting [1]
pg 1.fb is stuck inactive for 493857.572982, current state undersized+degraded+peered, last acting [4]
pg 2.f8 is stuck inactive since forever, current state undersized+degraded+peered, last acting [3]
pg 1.fa is stuck inactive for 492185.443146, current state undersized+degraded+peered, last acting [0]
pg 2.f9 is stuck inactive since forever, current state undersized+degraded+peered, last acting [0]
pg 1.f9 is stuck inactive for 492185.452890, current state undersized+degraded+peered, last acting [2]
pg 2.fa is stuck inactive since forever, current state undersized+degraded+peered, last acting [3]
pg 1.f8 is stuck inactive for 492185.443324, current state undersized+degraded+peered, last acting [0]
pg 2.fb is stuck inactive since forever, current state undersized+degraded+peered, last acting [2]
.
.
.

pg 1.fb is undersized+degraded+peered, acting [4]
pg 2.ff is undersized+degraded+peered, acting [1]
pg 2.fe is undersized+degraded+peered, acting [2]
pg 2.fd is undersized+degraded+peered, acting [0]
pg 2.fc is undersized+degraded+peered, acting [2]
3 ops are blocked > 536871 sec on osd.4
15 ops are blocked > 268435 sec on osd.4
1 ops are blocked > 262.144 sec on osd.4
2 ops are blocked > 268435 sec on osd.3
1 ops are blocked > 268435 sec on osd.1
3 osds have slow requests
recovery 2/8 objects degraded (25.000%)
recovery 2/8 objects misplaced (25.000%)
crush map has legacy tunables (require argonaut, min is firefly); see http://ceph.com/docs/master/rados/operations/crush-map/#tunables
crush map has straw_calc_version=0; see http://ceph.com/docs/master/rados/operations/crush-map/#tunables

ceph osd stat

cluster-admin@nodeB:~/.ssh/ceph-cluster$ cat ceph_osd_stat.txt
     osdmap e80: 10 osds: 5 up, 5 in; 558 remapped pgs
            flags sortbitwise

ceph osd tree:

cluster-admin@nodeB:~/.ssh/ceph-cluster$ ceph osd tree
ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 9.08691 root default
-2 4.54346     host nodeB
 5 0.90869         osd.5     down        0          1.00000
 6 0.90869         osd.6     down        0          1.00000
 7 0.90869         osd.7     down        0          1.00000
 8 0.90869         osd.8     down        0          1.00000
 9 0.90869         osd.9     down        0          1.00000
-3 4.54346     host nodeC
 0 0.90869         osd.0       up  1.00000          1.00000
 1 0.90869         osd.1       up  1.00000          1.00000
 2 0.90869         osd.2       up  1.00000          1.00000
 3 0.90869         osd.3       up  1.00000          1.00000
 4 0.90869         osd.4       up  1.00000          1.00000

CrushMap:

# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host nodeB {
        id -2           # do not change unnecessarily
        # weight 4.543
        alg straw
        hash 0  # rjenkins1
        item osd.5 weight 0.909
        item osd.6 weight 0.909
        item osd.7 weight 0.909
        item osd.8 weight 0.909
        item osd.9 weight 0.909
}
host nodeC {
        id -3           # do not change unnecessarily
        # weight 4.543
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 0.909
        item osd.1 weight 0.909
        item osd.2 weight 0.909
        item osd.3 weight 0.909
        item osd.4 weight 0.909
}
root default {
        id -1           # do not change unnecessarily
        # weight 9.087
        alg straw
        hash 0  # rjenkins1
        item nodeB weight 4.543
        item nodeC weight 4.543
}

# rules
rule replicated_ruleset {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map

ceph.conf

cluster-admin@nodeB:~/.ssh/ceph-cluster$ cat /etc/ceph/ceph.conf
[global]
fsid = a04e9846-6c54-48ee-b26f-d6949d8bacb4
mon_initial_members = nodeB
mon_host = <mon IP>
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = X.X.X.0/24

On Sat, Jun 18, 2016 at 12:15 PM, David <dclistslinux@xxxxxxxxx> wrote:
Is this a test cluster that has never been healthy or a working cluster which has just gone unhealthy?  Have you changed anything? Are all hosts, drives, network links working? More detail please. Any/all of the following would help:
ceph health detail

ceph osd stat

ceph osd tree

Your ceph.conf

Your crushmap 

On 17 Jun 2016 14:14, "Ishmael Tsoaela" <ishmaelt3@xxxxxxxxx> wrote:

>

> Hi All,

>

> please assist to fix the error:

>

> 1 X admin

> 2 X admin(hosting admin as well)

>

> 4 osd each node
Please provide more detail, this suggests you should have 12 osd's but your osd map shows 10 osd's, 5 of which are down. 

>

>

> cluster a04e9846-6c54-48ee-b26f-d6949d8bacb4

>      health HEALTH_ERR

>             819 pgs are stuck inactive for more than 300 seconds

>             883 pgs degraded

>             64 pgs stale

>             819 pgs stuck inactive

>             245 pgs stuck unclean

>             883 pgs undersized

>             17 requests are blocked > 32 sec

>             recovery 2/8 objects degraded (25.000%)

>             recovery 2/8 objects misplaced (25.000%)

>             crush map has legacy tunables (require argonaut, min is firefly)

>             crush map has straw_calc_version=0

>      monmap e1: 1 mons at {nodeB=155.232.195.4:6789/0}

>             election epoch 7, quorum 0 nodeB

>      osdmap e80: 10 osds: 5 up, 5 in; 558 remapped pgs

>             flags sortbitwise

>       pgmap v480: 1064 pgs, 3 pools, 6454 bytes data, 4 objects

>             25791 MB used, 4627 GB / 4652 GB avail

>             2/8 objects degraded (25.000%)

>             2/8 objects misplaced (25.000%)

>                  819 undersized+degraded+peered

>                  181 active

>                   64 stale+active+undersized+degraded

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com