Manually mucked up pg, need help fixing

jbachtel@xxxxxxxxxxxxxxxxxxxxxx (Jeff Bachtel) · Sat, 03 May 2014 12:17:03 -0400

This is all on firefly rc1 on CentOS 6

I had an osd getting overfull, and misinterpreting directions I downed 
it then manually removed pg directories from the osd mount. On restart 
and after a good deal of rebalancing (setting osd weights as I should've 
originally), I'm now at

     cluster de10594a-0737-4f34-a926-58dc9254f95f
      health HEALTH_WARN 2 pgs backfill; 1 pgs incomplete; 1 pgs stuck 
inactive; 308 pgs stuck unclean; recov
ery 1/2420563 objects degraded (0.000%); noout flag(s) set
      monmap e7: 3 mons at 
{controller1=10.100.2.1:6789/0,controller2=10.100.2.2:6789/0,controller3=10.100.2.
3:6789/0}, election epoch 556, quorum 0,1,2 
controller1,controller2,controller3
      mdsmap e268: 1/1/1 up {0=controller1=up:active}
      osdmap e3492: 5 osds: 5 up, 5 in
             flags noout
       pgmap v4167420: 320 pgs, 15 pools, 4811 GB data, 1181 kobjects
             9770 GB used, 5884 GB / 15654 GB avail
             1/2420563 objects degraded (0.000%)
                    3 active
                   12 active+clean
                    2 active+remapped+wait_backfill
                    1 incomplete
                  302 active+remapped
   client io 364 B/s wr, 0 op/s

# ceph pg dump | grep 0.2f
dumped all in format plain
0.2f    0       0       0       0       0       0       0 
incomplete      2014-05-03 11:38:01.526832 0'0      3492:23 [4] 4       
[4]     4       2254'20053      2014-04-28 00:24:36.504086      
2100'18109 2014-04-26 22:26:23.699330

# ceph pg map 0.2f
osdmap e3492 pg 0.2f (0.2f) -> up [4] acting [4]

The pg query for the downed pg is at 
https://gist.github.com/jeffb-bt/c8730899ff002070b325

Of course, the osd I manually mucked with is the only one the cluster is 
picking up as up/acting. Now, I can query the pg and find epochs where 
other osds (that I didn't jack up) were acting. And in fact, the latest 
of those entries (osd.1) has the pg directory in its osd mount, and it's 
a good healthy 59gb.

I've tried manually rsync'ing (and preserving attributes) that set of 
directories from osd.1 to osd.4 without success. Likewise I've tried 
copying the directories over without attributes set. I've done many, 
many deep scrubs but the pg query does not show the scrub timestamps 
being affected.

I'm seeking ideas for either fixing metadata on the directory on osd.4 
to cause this pg to be seen/recognized, or ideas on forcing the 
cluster's pg map to point to osd.1 for the incomplete pg (basically 
wiping out the cluster's memory that osd.4 ever had 0.2f). Or any other 
solution :) It's only 59g, so worst case I'll mark it lost and recreate 
the pg, but I'd prefer to learn enough of the innards to understand what 
is going on, and possible means of fixing it.

Thanks for any help,

Jeff