pgs incomplete; pgs stuck inactive; pgs stuck unclean

jan.zeller@xxxxxxxxxxx (jan.zeller at id.unibe.ch) · Fri, 23 May 2014 13:31:04 +0000

Thanks for your tips & tricks.

This setup is now based on ubuntu 12.04, ceph version 0.80.1

Still using

1 x mon

3 x osds

root at ceph-node2:~# ceph osd tree

# id        weight type name         up/down            reweight

-1            0             root default

-2            0                             host ceph-node2

0             0                                             osd.0     up          1

-3            0                             host ceph-node3

1             0                                             osd.1     up          1

-4            0                             host ceph-node1

2             0                                             osd.2     up          1

root at ceph-node2:~# ceph -s

    cluster c30e1410-fe1a-4924-9112-c7a5d789d273

     health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean

     monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election epoch 2, quorum 0 ceph-node1

     osdmap e11: 3 osds: 3 up, 3 in

      pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects

            102 MB used, 15224 MB / 15326 MB avail

                 192 incomplete

root at ceph-node2:~# cat mycrushmap.txt

# begin crush map

tunable choose_local_tries 0

tunable choose_local_fallback_tries 0

tunable choose_total_tries 50

tunable chooseleaf_descend_once 1

# devices

device 0 osd.0

device 1 osd.1

device 2 osd.2

# types

type 0 osd

type 1 host

type 2 chassis

type 3 rack

type 4 row

type 5 pdu

type 6 pod

type 7 room

type 8 datacenter

type 9 region

type 10 root

# buckets

host ceph-node2 {

                id -2                      # do not change unnecessarily

                # weight 0.000

                alg straw

                hash 0   # rjenkins1

                item osd.0 weight 0.000

}

host ceph-node3 {

                id -3                      # do not change unnecessarily

                # weight 0.000

                alg straw

                hash 0   # rjenkins1

                item osd.1 weight 0.000

}

host ceph-node1 {

                id -4                      # do not change unnecessarily

                # weight 0.000

                alg straw

                hash 0   # rjenkins1

                item osd.2 weight 0.000

}

root default {

                id -1                      # do not change unnecessarily

                # weight 0.000

                alg straw

                hash 0   # rjenkins1

                item ceph-node2 weight 0.000

                item ceph-node3 weight 0.000

                item ceph-node1 weight 0.000

}

# rules

rule replicated_ruleset {

                ruleset 0

                type replicated

                min_size 1

                max_size 10

                step take default

                step chooseleaf firstn 0 type host

                step emit

}

# end crush map

Is there anything wrong with it ?

root at ceph-node2:~# ceph osd dump

epoch 11

fsid c30e1410-fe1a-4924-9112-c7a5d789d273

created 2014-05-23 15:16:57.772981

modified 2014-05-23 15:18:17.022152

flags

pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool crash_replay_interval 45 stripe_width 0

pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool stripe_width 0

pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool stripe_width 0 max_osd 3

osd.0 up   in  weight 1 up_from 4 up_thru 5 down_at 0 last_clean_interval [0,0) 192.168.123.49:6800/4714 192.168.123.49:6801/4714 192.168.123.49:6802/4714 192.168.123.49:6803/4714 exists,up bc991a4b-9e60-4759-b35a-7f58852aa804

osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) 192.168.123.50:6800/4685 192.168.123.50:6801/4685 192.168.123.50:6802/4685 192.168.123.50:6803/4685 exists,up bd099d83-2483-42b9-9dbc-7f4e4043ca60

osd.2 up   in  weight 1 up_from 11 up_thru 0 down_at 0 last_clean_interval [0,0) 192.168.123.53:6800/16807 192.168.123.53:6801/16807 192.168.123.53:6802/16807 192.168.123.53:6803/16807 exists,up 80a302d0-3493-4c39-b34b-5af233b32ba1

thanks

Von: ceph-users [mailto:ceph-users-bounces at lists.ceph.com] Im Auftrag von Michael
Gesendet: Freitag, 23. Mai 2014 12:36
An: ceph-users at lists.ceph.com
Betreff: Re: pgs incomplete; pgs stuck inactive; pgs stuck unclean

64 PG's per pool shouldn't cause any issues while there's only 3 OSD's. It'll be something to pay attention to if a lot more get added through.

Your replication setup is probably anything other than host.
You'll want to extract your crush map then decompile it and see if your "step" is set to osd or rack.
If it's not host then change it to that and pull it in again.

Check the docs on crush maps http://ceph.com/docs/master/rados/operations/crush-map/ for more info.

-Michael

On 23/05/2014 10:53, Karan Singh wrote:
Try increasing the placement groups for pools

ceph osd pool set data pg_num 128
ceph osd pool set data pgp_num 128

similarly for other 2 pools as well.

- karan -
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140523/4739c587/attachment.htm>