stale+incomplete pgs on new cluster

greg@xxxxxxxxxxx (Gregory Farnum) · Tue, 19 Aug 2014 14:36:20 -0700



[Re-adding the list]

On Tue, Aug 19, 2014 at 2:24 PM, Randy Smith <rbsmith at adams.edu> wrote:
> Gregory,
>
> # ceph osd tree
> # id    weight  type name       up/down reweight
> -1      0.2     root default
> -2      0.2             host cs00
> 0       0.09999                 osd.0   up      1
> 1       0.09999                 osd.1   up      1
>
> The min_size on the ruleset is 1.  Here's the relevant portion of my
> crushmap.

Sorry, by "min_size" I mean the member associated with the pool, not
with the rule. You should be able to see it by looking at the pools
section of "ceph osd dump", among others. I believe that by default it
should be 2, and the full size is 3...which if you have 2 OSDs and
aren't segregating across hosts should be good, but I'm guessing your
pools are using the default rules that will want to segregate across
hosts.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


>
> # buckets
> host cs00 {
>         id -2           # do not change unnecessarily
>         # weight 0.200
>         alg straw
>         hash 0  # rjenkins1
>         item osd.0 weight 0.100
>         item osd.1 weight 0.100
> }
> root default {
>         id -1           # do not change unnecessarily
>         # weight 0.200
>         alg straw
>         hash 0  # rjenkins1
>         item cs00 weight 0.200
> }
>
> # rules
> rule replicated_ruleset {
>         ruleset 0
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step chooseleaf firstn 0 type osd
>         step emit
> }
>
> This is a single host "cluster" as it's just for testing.
>
>
>
> On Tue, Aug 19, 2014 at 2:44 PM, Gregory Farnum <greg at inktank.com> wrote:
>>
>> On Tue, Aug 19, 2014 at 1:37 PM, Randy Smith <rbsmith at adams.edu> wrote:
>> > Greetings,
>> >
>> > I'm creating a new ceph cluster for testing and it's reporting "192
>> > stale+incomplete" pgs.
>> >
>> > `ceph health detail` lists all of the pgs that are stuck. Here's a
>> > representative line.
>> >
>> >   pg 2.2c is stuck stale for 3076.510998, current state
>> > stale+incomplete,
>> > last acting [0]
>> >
>> > But when I run `ceph pg 2.2c query`, I get
>> >
>> >   Error ENOENT: i don't have pgid 2.2c
>> >
>> > What can I do to clean up the cluster? I have no problems wiping the
>> > system
>> > because it's a new test cluster.
>>
>> Incomplete generally means the PGs aren't mapped to enough OSDs.
>> What's the output of "ceph osd tree" and what is the min_size set to
>> for your pools?
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
>
>
> --
> Randall Smith
> Computing Services
> Adams State University
> http://www.adams.edu/
> 719-587-7741