Re: Unclean PGs in active+degrared or active+remapped

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 23 Jul 2013 15:43:32 -0700

On Fri, Jul 19, 2013 at 3:44 PM, Pawel Veselov <pawel.veselov@xxxxxxxxx> wrote:
> Hi.
>
> I'm trying to understand the reason behind some of my unclean pages, after
> moving some OSDs around. Any help would be greatly appreciated.I'm sure we
> are missing something, but can't quite figure out what.
>
> [root@ip-10-16-43-12 ec2-user]# ceph health detail
> HEALTH_WARN 29 pgs degraded; 68 pgs stuck unclean; recovery 4071/217370
> degraded (1.873%)
> pg 0.50 is stuck unclean since forever, current state active+degraded, last
> acting [2]
> ...
> pg 2.4b is stuck unclean for 836.989336, current state active+remapped, last
> acting [3,2]
> ...
> pg 0.6 is active+degraded, acting [3]
>
> These are distinct examples of problems. There are total of 676 page groups.
> Query shows pretty much the same on them: .

Nit: PG="placement group". :)
Anyway, the problem appears to be that you've got two OSDs total,
buried under a bit of a hierarchy (rack and host, each) and the
pseudo-random nature of CRUSH is just having trouble getting to both
of them for mapping all the PGs.

If you aren't using the kernel client (or have a very new kernel
client, >=3.9) then you can run "ceph crush set tunables optimal" (see
http://ceph.com/docs/master/rados/operations/crush-map/#tunables) and
this should get all better thanks to some better settings that we
worked out last year.

On Fri, Jul 19, 2013 at 4:20 PM, Mike Lowe <j.michael.lowe@xxxxxxxxx> wrote:
> I'm by no means an expert, but from what I understand you do need to stick
> to numbering from zero if you want things to work out in the long term.

This is good general advice but it wouldn't cause the kinds of issues
seen here, and is really only a problem if the list of OSD numbers is
very sparse.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com