2 pg in 'active+undersized+degraded' state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

I’m looking for some help in figuring out why there are 2 pg’s in our cluster in 'active+undersized+degraded' state. They don’t seem to get assigned a 3rd osd to place data on. I’m not sure why, everything looks ‘ok’ to me. Our ceph cluster consists of 3 nodes and has been upgraded from firefly to hammer to infernalis during its lifetime. Everything was fine until a few days ago when I created a new pool with

# ceph osd pool create poolname 2048 replicated

and proceeded to create two rbd images

# rbd create --size 2T poolname/image1

# rbd create --size 1600G poolname/image2

 

Since that moment ceph health shows a warning:

    cluster 6318a6a2-808b-45a1-9c89-31575c58de49

     health HEALTH_WARN

            2 pgs degraded

            2 pgs stuck degraded

            2 pgs stuck unclean

            2 pgs stuck undersized

            2 pgs undersized

            recovery 389/9308133 objects degraded (0.004%)

     monmap e7: 4 mons at {md002=172.19.20.2:6789/0,md005=172.19.20.5:6789/0,md008=172.19.20.8:6789/0,md010=172.19.20.10:6789/0}

            election epoch 18774, quorum 0,1,2,3 md002,md005,md008,md010

     osdmap e105161: 30 osds: 30 up, 30 in

      pgmap v12776313: 2880 pgs, 5 pools, 12089 GB data, 3029 kobjects

            36771 GB used, 24661 GB / 61433 GB avail

            389/9308133 objects degraded (0.004%)

                2878 active+clean

                   2 active+undersized+degraded

  client io 1883 kB/s rd, 15070 B/s wr, 1 op/s

 

There is no recovery going on.

We are on version 9.2.1 on centos 7 with kernel 4.4.9 except for the monitoring node which is still on 4.4.0.

 

I’ve used crushtool to check whether the mapping should be ok, it seems to be fine (but I think this assumes all nodes in a cluster to be exactly the same, which they are not in our situation).

There are no errors in the ceph logs. (zgrep –i err *gz in /var/log/ceph)

Pg-num and pgp-num are both set to 3 for this pool.

 

The details:

HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; recovery 389/9308133 objects degraded (0.004%)

pg 24.17 is stuck unclean since forever, current state active+undersized+degraded, last acting [23,1]

pg 24.54a is stuck unclean since forever, current state active+undersized+degraded, last acting [8,19]

pg 24.17 is stuck undersized for 9653.439112, current state active+undersized+degraded, last acting [23,1]

pg 24.54a is stuck undersized for 9659.961863, current state active+undersized+degraded, last acting [8,19]

pg 24.17 is stuck degraded for 9653.439186, current state active+undersized+degraded, last acting [23,1]

pg 24.54a is stuck degraded for 9659.961940, current state active+undersized+degraded, last acting [8,19]

pg 24.54a is active+undersized+degraded, acting [8,19]

pg 24.17 is active+undersized+degraded, acting [23,1]

recovery 389/9308133 objects degraded (0.004%)

 

# ceph pg dump_stuck degraded

ok

pg_stat  state      up           up_primary          acting    acting_primary

24.17     active+undersized+degraded          [23,1]     23           [23,1]     23

24.54a   active+undersized+degraded          [8,19]     8              [8,19]     8

 

# ceph pg map 24.17

osdmap e105161 pg 24.17 (24.17) -> up [23,1] acting [23,1]

# ceph pg map 24.54a

osdmap e105161 pg 24.54a (24.54a) -> up [8,19] acting [8,19]

 

The osd tree and crushmap can be found here: http://pastebin.com/i4BQq5Mi

 

I’m hoping for some insight into why this is happening. I couldn’t find much out there on the net about undersized pg states other than that there are people trying to get a replication of 3 with less than 3 osd’s or less than 3 hosts when a host level has been specified in the crushmap hierarchy but that doesn’t apply here.

 

Best regards,

 

 

Max

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux