Re: all pgs of erasure coded pool stuck stale

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 13 Nov 2015 10:14:01 -0800

Somebody else will need to do the diagnosis, but it'll help them if
you can get logs with "debug ms = 1", "debug osd = 20" in the log.

Based on the required features update in the crush map, it looks like
maybe you've upgraded some of your OSDs — is that a thing happening
right now? Perhaps you upgraded some of your OSDs, but not the ones
that just rebooted, and when they went down the cluster upgraded its
required feature set?
-Greg

On Fri, Nov 13, 2015 at 8:12 AM, Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx> wrote:
> Hi all,
>
> What could be the reason that all pgs of a whole Erasure Coded pool are
> stuck stale? All OSDS are restarted and up..
>
> The details:
> We have a setup with 14 OSD hosts with specific OSDs for an Erasure coded
> pool and 2 SSDS for a cache pool, and 3 seperate monitor/metadata nodes with
> ssds for the metadata pool
>
> This afternoon I had to reboot some OSD nodes, because they weren't
> reachable anymore. After the cluster recovered, some pgs were stuck stale. I
> saw with `health detail` that it were all the pgs of 2 specific EC-pool
> osds. I tried with restarting them, but that didn't solve the problem. I
> restarted all osds on those nodes, but now all pgs on the osds for EC on
> that node were stuck stale. I read in the doc that this state is reached
> when it is not communicating with the monitors, so I restarted the monitors.
> Since that did not solve it, I tried to restart everything.
>
> When the cluster was recovered again, all other PGs are back active+clean,
> except for the pgs in the EC pool, those are still stale+active+clean or
> even stale+active+clean+scrubbing+deep
>
> When I try to query such a pg (eg. `ceph pg 2.1b0 query`), it just hangs
> there.. That is not the case for the other pools
> If I interrupt, I get: Error EINTR: problem getting command descriptions
> from pg.2.1b0
>
> I can't see anything strange in the logs of these pgs (attached)
>
> Someone an idea?
>
> Help very much appreciated!
>
> Thanks!
>
> Kenneth
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com