Re: Why is this pg incomplete?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 4 Jan 2016 07:12:44 -0800



On Fri, Jan 1, 2016 at 12:15 PM, Bryan Wright <bkw1a@xxxxxxxxxxxx> wrote:
> Hi folks,
>
> "ceph pg dump_stuck inactive" shows:
>
> 0.e8    incomplete      [406,504]       406     [406,504]       406
>
> Each of the osds above is alive and well, and idle.
>
> The output of "ceph pg 0.e8 query" is shown below.  All of the osds it refers
> to are alive and well, with the exception of osd 102 which died and has been
> removed from the cluster.
>
> Can anyone look at this and tell me why this pg is incomplete?
>
> Bryan
>
> "ceph pg query" output is here, because it's so large:
>
> http://ayesha.phys.virginia.edu/~bryan/errant-pg.txt

I can't parse all of that output, but the most important and
easiest-to-understand bit is:
            "blocked_by": [
                102
            ],

And indeed in the past_intervals section there are a bunch where it's
just 102. You really want min_size >=2 for exactly this reason. :/ But
if you get 102 up stuff should recover; if you can't you can mark it
as "lost" and RADOS ought to resume processing, with potential
data/metadata loss...
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com