Unclean PGs in active+degrared or active+remapped

Pawel Veselov <pawel.veselov@xxxxxxxxx> · Fri, 19 Jul 2013 15:44:52 -0700

Hi.
I'm trying to understand the reason behind some of my unclean pages, after moving some OSDs around. Any help would be greatly appreciated.I'm sure we are missing something, but can't quite figure out what.

[root@ip-10-16-43-12 ec2-user]# ceph health detail
HEALTH_WARN 29 pgs degraded; 68 pgs stuck unclean; recovery 4071/217370 degraded (1.873%)
pg 0.50 is stuck unclean since forever, current state active+degraded, last acting [2]

...
pg 2.4b is stuck unclean for 836.989336, current state active+remapped, last acting [3,2]
...
pg 0.6 is active+degraded, acting [3]

These are distinct examples of problems. There are total of 676 page groups.
Query shows pretty much the same on them: .

crush map: http://pastebin.com/4Hkkgau6

There are some pg_temps (I don't quite understand what those are), that are mapped to non-existing OSDs. osdmap: http://pastebin.com/irbRNYJz
queries for all stuck page groups:http://pastebin.com/kzYa6s2G

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com