Hi folks I've been rebuilding drives in my cluster to add space. This has gone well so far. After the last batch of rebuilds, I'm left with one placement group in an incomplete state. [sudo] password for jpr: HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck unclean pg 3.ea is stuck inactive since forever, current state incomplete, last acting [30,11] pg 3.ea is stuck unclean since forever, current state incomplete, last acting [30,11] pg 3.ea is incomplete, acting [30,11] I've restarted both OSD a few times but it hasn't cleared the error. On the primary I see errors in the log related to slow requests: 2015-10-20 08:40:36.678569 7f361585c700 0 log [WRN] : 8 slow requests, 3 included below; oldest blocked for > 31.922487 secs 2015-10-20 08:40:36.678580 7f361585c700 0 log [WRN] : slow request 31.531606 seconds old, received at 2015-10-20 08:40:05.146902: osd_op(client.158903.1:343217143 rb.0.25cf8.238e1f29.00000000a044 [read 1064960~262144] 3.ae9968ea RETRY) v4 currently reached pg 2015-10-20 08:40:36.678592 7f361585c700 0 log [WRN] : slow request 31.531591 seconds old, received at 2015-10-20 08:40:05.146917: osd_op(client.158903.1:343217144 rb.0.25cf8.238e1f29.00000000a044 [read 2113536~262144] 3.ae9968ea RETRY) v4 currently reached pg 2015-10-20 08:40:36.678599 7f361585c700 0 log [WRN] : slow request 31.531551 seconds old, received at 2015-10-20 08:40:05.146957: osd_op(client.158903.1:343232634 ekessler-default.rbd [watch 35~0] 3.e4bd50ea) v4 currently reached pg Note's online suggest this is an issue with the journal and that it may be possible to export and rebuild thepg. I don't have firefly. https://ceph.com/community/incomplete-pgs-oh-my/ Interestingly, pg 3.ea appears to be complete on osd.11 (the secondary) but missing entirely on osd.30 (the primary). on osd.33 (primary): crowbar@da0-36-9f-0e-2b-88:~$ du -sk /var/lib/ceph/osd/ceph-30/current/3.ea_head/ 0 /var/lib/ceph/osd/ceph-30/current/3.ea_head/ on osd.11 (secondary): crowbar@da0-36-9f-0e-2b-40:~$ du -sh /var/lib/ceph/osd/ceph-11/current/3.ea_head/ 63G /var/lib/ceph/osd/ceph-11/current/3.ea_head/ This makes some sense since, my disk drive rebuilding activity reformatted the primary osd.30. It also gives me some hope that my data is not lost. I understand incomplete means problem with journal, but is there a way to dig deeper into this or possible to get the secondary's data to take over? Thanks, ~jpr _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com