Re: pgs stuck unclean on a new pool despite the pool size reconfiguration

Giuseppe Civitella <giuseppe.civitella@xxxxxxxxx> · Fri, 2 Oct 2015 17:42:52 +0200

Hi Warren,

a simple:
ceph osd pool set bench2 hashpspool false
solved my problem.

Thank a lot
Giuseppe

2015-10-02 16:18 GMT+02:00 Warren Wang - ISD <Warren.Wang@xxxxxxxxxxx>:
You probably don’t want hashpspool automatically set, since your clients may still not understand that crush map feature. You can try to unset it for that pool and see what happens, or create a new pool without hashpspool enabled from the start.  Just a guess.

Warren

From: Giuseppe Civitella <giuseppe.civitella@xxxxxxxxx<mailto:giuseppe.civitella@xxxxxxxxx>>

Date: Friday, October 2, 2015 at 10:05 AM

To: ceph-users <ceph-users@xxxxxxxx<mailto:ceph-users@xxxxxxxx>>

Subject:  pgs stuck unclean on a new pool despite the pool size reconfiguration

Hi all,

I have a Firefly cluster which has been upgraded from Emperor.

It has 2 OSD hosts and 3 monitors.

The cluster has default values for what concerns size and min_size of the pools.

Once upgraded to Firefly, I created a new pool called bench2:

ceph osd pool create bench2 128 128

and set its sizes:

ceph osd pool set bench2 size 2

ceph osd pool set bench2 min_size 1

this is the state of the pools:

pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 crash_replay_interval 45 stripe_width 0

pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0

pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0

pool 3 'volumes' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 384 pgp_num 384 last_change 2568 stripe_width 0

        removed_snaps [1~75]

pool 4 'images' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 384 pgp_num 384 last_change 1895 stripe_width 0

pool 8 'bench2' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 2580 flags hashpspool stripe_width 0

despite this I still get a warning about 128 pgs stuck unclean.

The "ceph health detail" shows me the stuck PGs. So i take one to get the involved OSDs:

pg 8.38 is stuck unclean since forever, current state active, last acting [22,7]

if I restart the OSD with id 22, the PG 8.38 gets an active+clean state.

This is an incorrect behavior, AFAIK. The cluster should get noticed of the new size and min_size values without any manual intervention. So my question is: any idea about why this happens and how to restore the default behavior? Do I need to restart all of the OSDs to restore an healthy state?

thanks a lot

Giuseppe

This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com