Hi, I'm running into a issue with Ceph 0.94.2/3 where after doing a recovery test 9 PGs stay incomplete: osdmap e78770: 2294 osds: 2294 up, 2294 in pgmap v1972391: 51840 pgs, 7 pools, 220 TB data, 185 Mobjects 755 TB used, 14468 TB / 15224 TB avail 51831 active+clean 9 incomplete As you can see, all 2294 OSDs are online and about all PGs became active+clean again, except for 9. I found out that these PGs are the problem: 10.3762 7.309e 7.29a2 10.2289 7.17dd 10.165a 7.1050 7.c65 10.abf Digging further, all the PGs map back to a OSD which is running on the same host. 'ceph-stg-01' in this case. $ ceph pg 10.3762 query Looking at the recovery state, this is shown: { "first": 65286, "last": 67355, "maybe_went_rw": 0, "up": [ 1420, 854, 1105 ], "acting": [ 1420 ], "primary": 1420, "up_primary": 1420 }, osd.1420 is online. I tried restarting it, but nothing happens, these 9 PGs stay incomplete. Under 'peer_info' info I see both osd.854 and osd.1105 reporting about the PG with identical numbers. I restarted both 854 and 1105, without result. The output of PG query can be found here: http://pastebin.com/qQL699zC The cluster is running a mix of 0.94.2 and .3 on Ubuntu 14.04.2 with the 3.13 kernel. XFS is being used as the backing filesystem. Any suggestions to fix this issue? There is no valuable data in these pools, so I can remove them, but I'd rather fix the root-cause. -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com