Re: PG active+clean+degraded, but not creating new replicas

YIP Wai Peng <yipwp@xxxxxxxxxxxxxxx> · Tue, 4 Jun 2013 13:23:04 +0800

Hi Sage,
Thanks, I noticed after re-reading the documentation.

I realized that osd.8 was not in host3. After adding osd.8 to host3, the PGs are now in "active+remapped"

# ceph pg 3.45 query

{ "state": "active+remapped",
  "epoch": 1374,
  "up": [
        4,
        8],
  "acting": [
        4,
        8,
        6],
<snip>

Still, nothing is happening. What can be wrong?

- WP

On Tue, Jun 4, 2013 at 12:26 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:

On Tue, 4 Jun 2013, YIP Wai Peng wrote:

> Sorry, to set things in context, I had some other problems last weekend.

> Setting it to optimal tunables helped (although I am on the older kernel).

> Since it worked, I was inclined to believed that the tunables do work on the

> older kernel.

> That being said, I will upgrade the kernel to see if this issue goes away.

The kernel version is only an issue if you are using the cephfs or rbd

*client* from the kernel (e.g., rbd map ... or mount -t ceph ...).  (Ceph

didn't appear upstream until 2.6.35 or thereabouts, and fixes are only

backported as far as v3.4.)

sage

>

> Regards,

> Wai Peng

>

>

> On Tue, Jun 4, 2013 at 12:01 PM, YIP Wai Peng <yipwp@xxxxxxxxxxxxxxx> wrote:

>       Hi Sage,

> It is on optimal tunables already. However, I'm on kernel

> 2.6.32-358.6.2.el6.x86_64. Will the tunables take effect or do I have

> to upgrade to something newer?

>

> - WP

>

>

> On Tue, Jun 4, 2013 at 11:58 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:

>       On Tue, 4 Jun 2013, YIP Wai Peng wrote:

>       > Hi all,

>       > I'm running ceph on CentOS6 on 3 hosts, with 3 OSD each

>       (total 9 OSD).

>       > When I increased one of my pool rep size from 2 to 3,

>       just 6 PGs will get

>       > stuck in active+clean+degraded mode, but it doesn't

>       create new replicas.

>

> My first guess is that you do not have the newer crush tunables

> set and

> some placements are not quite right.  If you are prepared for

> some data

> migration, and are not using an older kernel client, try

>

>  ceph osd crush tunables optimal

>

> sage

>

>

> >

> > One of the problematic PG has the following (snipped for

> brevity) 

> >

> > { "state": "active+clean+degraded",

> >   "epoch": 1329,

> >   "up": [

> >         4,

> >         6],

> >   "acting": [

> >         4,

> >         6],

> > <snip>

> >   "recovery_state": [

> >         { "name": "Started\/Primary\/Active",

> >           "enter_time": "2013-06-04 01:10:30.092977",

> >           "might_have_unfound": [

> >                 { "osd": 3,

> >                   "status": "already probed"},

> >                 { "osd": 5,

> >                   "status": "not queried"},

> >                 { "osd": 6,

> >                   "status": "already probed"}],

> > <snip>

> >

> >

> > I tried force_create_pg but it gets stuck in "creating". Any

> ideas on how to

> > "kickstart" this node to create the correct numbers of

> replicas?

> >

> >

> > PS: I have the following crush rule for the pool, which makes

> the replicas

> > go to different hosts. 

> > host1 has OSD 0,1,2

> > host2 has OSD 3,4,5

> > host3 has OSD 6,7,8

> > Looking at it, the new replica should be going to OSD 0,1,2,

> but ceph is not

> > creating it?

> >

> > rule different_host {

> >         ruleset 3

> >         type replicated

> >         min_size 1

> >         max_size 10

> >         step take default

> >         step chooseleaf firstn 0 type host

> >         step emit

> > }

> >

> >

> > Any help will be much appreciated. Cheers

> > - Wai Peng

> >

> >

>

>

>

>

> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com