PG active+clean+degraded, but not creating new replicas

YIP Wai Peng <yipwp@xxxxxxxxxxxxxxx> · Tue, 4 Jun 2013 10:38:46 +0800

Hi all,
I'm running ceph on CentOS6 on 3 hosts, with 3 OSD each (total 9 OSD).
When I increased one of my pool rep size from 2 to 3, just 6 PGs will get stuck in active+clean+degraded mode, but it doesn't create new replicas.

One of the problematic PG has the following (snipped for brevity) 

{ "state": "active+clean+degraded",
  "epoch": 1329,
  "up": [
        4,
        6],
  "acting": [
        4,
        6],
<snip>
  "recovery_state": [
        { "name": "Started\/Primary\/Active",
          "enter_time": "2013-06-04 01:10:30.092977",
          "might_have_unfound": [
                { "osd": 3,
                  "status": "already probed"},
                { "osd": 5,
                  "status": "not queried"},
                { "osd": 6,
                  "status": "already probed"}],
<snip>

I tried force_create_pg but it gets stuck in "creating". Any ideas on how to "kickstart" this node to create the correct numbers of replicas?

PS: I have the following crush rule for the pool, which makes the replicas go to different hosts. 
host1 has OSD 0,1,2
host2 has OSD 3,4,5
host3 has OSD 6,7,8
Looking at it, the new replica should be going to OSD 0,1,2, but ceph is not creating it?

rule different_host {
        ruleset 3
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

Any help will be much appreciated. Cheers
- Wai Peng
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com