stale+incomplete pgs on new cluster

rbsmith@xxxxxxxxx (Randy Smith) · Tue, 19 Aug 2014 15:51:26 -0600

On Tue, Aug 19, 2014 at 3:36 PM, Gregory Farnum <greg at inktank.com> wrote:

> [Re-adding the list]
>
> On Tue, Aug 19, 2014 at 2:24 PM, Randy Smith <rbsmith at adams.edu> wrote:
> > Gregory,
> >
> > # ceph osd tree
> > # id    weight  type name       up/down reweight
> > -1      0.2     root default
> > -2      0.2             host cs00
> > 0       0.09999                 osd.0   up      1
> > 1       0.09999                 osd.1   up      1
> >
> > The min_size on the ruleset is 1.  Here's the relevant portion of my
> > crushmap.
>
> Sorry, by "min_size" I mean the member associated with the pool, not
> with the rule. You should be able to see it by looking at the pools
> section of "ceph osd dump", among others. I believe that by default it
> should be 2, and the full size is 3...which if you have 2 OSDs and
> aren't segregating across hosts should be good,

Ah, yes, of course. All of the pools were set to size 3 and min_size 2.
I've reset those to 2 and 1 since I only have two osds in this test.

> but I'm guessing your
> pools are using the default rules that will want to segregate across
> hosts.
>

I changed the crushmap to use osd rather than host in the 'rule_replicated'
rule which I though would take care of that.

I guess I missed something since I'm still seeing the incomplete pgs.

> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> >
> > # buckets
> > host cs00 {
> >         id -2           # do not change unnecessarily
> >         # weight 0.200
> >         alg straw
> >         hash 0  # rjenkins1
> >         item osd.0 weight 0.100
> >         item osd.1 weight 0.100
> > }
> > root default {
> >         id -1           # do not change unnecessarily
> >         # weight 0.200
> >         alg straw
> >         hash 0  # rjenkins1
> >         item cs00 weight 0.200
> > }
> >
> > # rules
> > rule replicated_ruleset {
> >         ruleset 0
> >         type replicated
> >         min_size 1
> >         max_size 10
> >         step take default
> >         step chooseleaf firstn 0 type osd
> >         step emit
> > }
> >
> > This is a single host "cluster" as it's just for testing.
> >
> >
> >
> > On Tue, Aug 19, 2014 at 2:44 PM, Gregory Farnum <greg at inktank.com>
> wrote:
> >>
> >> On Tue, Aug 19, 2014 at 1:37 PM, Randy Smith <rbsmith at adams.edu> wrote:
> >> > Greetings,
> >> >
> >> > I'm creating a new ceph cluster for testing and it's reporting "192
> >> > stale+incomplete" pgs.
> >> >
> >> > `ceph health detail` lists all of the pgs that are stuck. Here's a
> >> > representative line.
> >> >
> >> >   pg 2.2c is stuck stale for 3076.510998, current state
> >> > stale+incomplete,
> >> > last acting [0]
> >> >
> >> > But when I run `ceph pg 2.2c query`, I get
> >> >
> >> >   Error ENOENT: i don't have pgid 2.2c
> >> >
> >> > What can I do to clean up the cluster? I have no problems wiping the
> >> > system
> >> > because it's a new test cluster.
> >>
> >> Incomplete generally means the PGs aren't mapped to enough OSDs.
> >> What's the output of "ceph osd tree" and what is the min_size set to
> >> for your pools?
> >> -Greg
> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
> >
> >
> >
> >
> > --
> > Randall Smith
> > Computing Services
> > Adams State University
> > http://www.adams.edu/
> > 719-587-7741
>

-- 
Randall Smith
Computing Services
Adams State University
http://www.adams.edu/
719-587-7741
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140819/078192a6/attachment.htm>