Re: PG active+clean+degraded, but not creating new replicas

YIP Wai Peng <yipwp@xxxxxxxxxxxxxxx> · Tue, 4 Jun 2013 12:08:46 +0800

Sorry, to set things in context, I had some other problems last weekend. Setting it to optimal tunables helped (although I am on the older kernel). Since it worked, I was inclined to believed that the tunables do work on the older kernel.

That being said, I will upgrade the kernel to see if this issue goes away.

Regards,
Wai Peng

On Tue, Jun 4, 2013 at 12:01 PM, YIP Wai Peng <yipwp@xxxxxxxxxxxxxxx> wrote:

Hi Sage,
It is on optimal tunables already. However, I'm on kernel 2.6.32-358.6.2.el6.x86_64. Will the tunables take effect or do I have to upgrade to something newer?

- WP

On Tue, Jun 4, 2013 at 11:58 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:

On Tue, 4 Jun 2013, YIP Wai Peng wrote:

> Hi all,

> I'm running ceph on CentOS6 on 3 hosts, with 3 OSD each (total 9 OSD).

> When I increased one of my pool rep size from 2 to 3, just 6 PGs will get

> stuck in active+clean+degraded mode, but it doesn't create new replicas.

My first guess is that you do not have the newer crush tunables set and

some placements are not quite right.  If you are prepared for some data

migration, and are not using an older kernel client, try

 ceph osd crush tunables optimal

sage

>

> One of the problematic PG has the following (snipped for brevity) 

>

> { "state": "active+clean+degraded",

>   "epoch": 1329,

>   "up": [

>         4,

>         6],

>   "acting": [

>         4,

>         6],

> <snip>

>   "recovery_state": [

>         { "name": "Started\/Primary\/Active",

>           "enter_time": "2013-06-04 01:10:30.092977",

>           "might_have_unfound": [

>                 { "osd": 3,

>                   "status": "already probed"},

>                 { "osd": 5,

>                   "status": "not queried"},

>                 { "osd": 6,

>                   "status": "already probed"}],

> <snip>

>

>

> I tried force_create_pg but it gets stuck in "creating". Any ideas on how to

> "kickstart" this node to create the correct numbers of replicas?

>

>

> PS: I have the following crush rule for the pool, which makes the replicas

> go to different hosts. 

> host1 has OSD 0,1,2

> host2 has OSD 3,4,5

> host3 has OSD 6,7,8

> Looking at it, the new replica should be going to OSD 0,1,2, but ceph is not

> creating it?

>

> rule different_host {

>         ruleset 3

>         type replicated

>         min_size 1

>         max_size 10

>         step take default

>         step chooseleaf firstn 0 type host

>         step emit

> }

>

>

> Any help will be much appreciated. Cheers

> - Wai Peng

>

> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com