Re: cluster ceph -s error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David,

Apologies for the late response.

NodeB is mon+client, nodeC is client:



Cheph health details:

HEALTH_ERR 819 pgs are stuck inactive for more than 300 seconds; 883 pgs degraded; 64 pgs stale; 819 pgs stuck inactive; 1064 pgs stuck unclean; 883 pgs undersized; 22 requests are blocked > 32 sec; 3 osds have slow requests; recovery 2/8 objects degraded (25.000%); recovery 2/8 objects misplaced (25.000%); crush map has legacy tunables (require argonaut, min is firefly); crush map has straw_calc_version=0
pg 2.fc is stuck inactive since forever, current state undersized+degraded+peered, last acting [2]
pg 2.fd is stuck inactive since forever, current state undersized+degraded+peered, last acting [0]
pg 2.fe is stuck inactive since forever, current state undersized+degraded+peered, last acting [2]
pg 2.ff is stuck inactive since forever, current state undersized+degraded+peered, last acting [1]
pg 1.fb is stuck inactive for 493857.572982, current state undersized+degraded+peered, last acting [4]
pg 2.f8 is stuck inactive since forever, current state undersized+degraded+peered, last acting [3]
pg 1.fa is stuck inactive for 492185.443146, current state undersized+degraded+peered, last acting [0]
pg 2.f9 is stuck inactive since forever, current state undersized+degraded+peered, last acting [0]
pg 1.f9 is stuck inactive for 492185.452890, current state undersized+degraded+peered, last acting [2]
pg 2.fa is stuck inactive since forever, current state undersized+degraded+peered, last acting [3]
pg 1.f8 is stuck inactive for 492185.443324, current state undersized+degraded+peered, last acting [0]
pg 2.fb is stuck inactive since forever, current state undersized+degraded+peered, last acting [2]
.
.
.

pg 1.fb is undersized+degraded+peered, acting [4]
pg 2.ff is undersized+degraded+peered, acting [1]
pg 2.fe is undersized+degraded+peered, acting [2]
pg 2.fd is undersized+degraded+peered, acting [0]
pg 2.fc is undersized+degraded+peered, acting [2]
3 ops are blocked > 536871 sec on osd.4
15 ops are blocked > 268435 sec on osd.4
1 ops are blocked > 262.144 sec on osd.4
2 ops are blocked > 268435 sec on osd.3
1 ops are blocked > 268435 sec on osd.1
3 osds have slow requests
recovery 2/8 objects degraded (25.000%)
recovery 2/8 objects misplaced (25.000%)
crush map has legacy tunables (require argonaut, min is firefly); see http://ceph.com/docs/master/rados/operations/crush-map/#tunables
crush map has straw_calc_version=0; see http://ceph.com/docs/master/rados/operations/crush-map/#tunables


ceph osd stat

cluster-admin@nodeB:~/.ssh/ceph-cluster$ cat ceph_osd_stat.txt
     osdmap e80: 10 osds: 5 up, 5 in; 558 remapped pgs
            flags sortbitwise


ceph osd tree:

cluster-admin@nodeB:~/.ssh/ceph-cluster$ ceph osd tree
ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 9.08691 root default
-2 4.54346     host nodeB
 5 0.90869         osd.5     down        0          1.00000
 6 0.90869         osd.6     down        0          1.00000
 7 0.90869         osd.7     down        0          1.00000
 8 0.90869         osd.8     down        0          1.00000
 9 0.90869         osd.9     down        0          1.00000
-3 4.54346     host nodeC
 0 0.90869         osd.0       up  1.00000          1.00000
 1 0.90869         osd.1       up  1.00000          1.00000
 2 0.90869         osd.2       up  1.00000          1.00000
 3 0.90869         osd.3       up  1.00000          1.00000
 4 0.90869         osd.4       up  1.00000          1.00000




CrushMap:


# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host nodeB {
        id -2           # do not change unnecessarily
        # weight 4.543
        alg straw
        hash 0  # rjenkins1
        item osd.5 weight 0.909
        item osd.6 weight 0.909
        item osd.7 weight 0.909
        item osd.8 weight 0.909
        item osd.9 weight 0.909
}
host nodeC {
        id -3           # do not change unnecessarily
        # weight 4.543
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 0.909
        item osd.1 weight 0.909
        item osd.2 weight 0.909
        item osd.3 weight 0.909
        item osd.4 weight 0.909
}
root default {
        id -1           # do not change unnecessarily
        # weight 9.087
        alg straw
        hash 0  # rjenkins1
        item nodeB weight 4.543
        item nodeC weight 4.543
}

# rules
rule replicated_ruleset {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map



ceph.conf


cluster-admin@nodeB:~/.ssh/ceph-cluster$ cat /etc/ceph/ceph.conf
[global]
fsid = a04e9846-6c54-48ee-b26f-d6949d8bacb4
mon_initial_members = nodeB
mon_host = <mon IP>
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = X.X.X.0/24





On Sat, Jun 18, 2016 at 12:15 PM, David <dclistslinux@xxxxxxxxx> wrote:

Is this a test cluster that has never been healthy or a working cluster which has just gone unhealthy?  Have you changed anything? Are all hosts, drives, network links working? More detail please. Any/all of the following would help:

ceph health detail
ceph osd stat
ceph osd tree
Your ceph.conf
Your crushmap

On 17 Jun 2016 14:14, "Ishmael Tsoaela" <ishmaelt3@xxxxxxxxx> wrote:
>
> Hi All,
>
> please assist to fix the error:
>
> 1 X admin
> 2 X admin(hosting admin as well)
>
> 4 osd each node

Please provide more detail, this suggests you should have 12 osd's but your osd map shows 10 osd's, 5 of which are down.
>
>
> cluster a04e9846-6c54-48ee-b26f-d6949d8bacb4
>      health HEALTH_ERR
>             819 pgs are stuck inactive for more than 300 seconds
>             883 pgs degraded
>             64 pgs stale
>             819 pgs stuck inactive
>             245 pgs stuck unclean
>             883 pgs undersized
>             17 requests are blocked > 32 sec
>             recovery 2/8 objects degraded (25.000%)
>             recovery 2/8 objects misplaced (25.000%)
>             crush map has legacy tunables (require argonaut, min is firefly)
>             crush map has straw_calc_version=0
>      monmap e1: 1 mons at {nodeB=155.232.195.4:6789/0}
>             election epoch 7, quorum 0 nodeB
>      osdmap e80: 10 osds: 5 up, 5 in; 558 remapped pgs
>             flags sortbitwise
>       pgmap v480: 1064 pgs, 3 pools, 6454 bytes data, 4 objects
>             25791 MB used, 4627 GB / 4652 GB avail
>             2/8 objects degraded (25.000%)
>             2/8 objects misplaced (25.000%)
>                  819 undersized+degraded+peered
>                  181 active
>                   64 stale+active+undersized+degraded
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux