Manually mucked up pg, need help fixing

jbachtel@xxxxxxxxxxxxxxxxxxxxxx (Jeff Bachtel) · Mon, 05 May 2014 19:39:36 -0400

Thanks. That is a cool utility, unfortunately I'm pretty sure the pg in 
question had a cephfs object instead of rbd images (because mounting 
cephfs is the only noticeable brokenness).

Jeff

On 05/05/2014 06:43 PM, Jake Young wrote:
> I was in a similar situation where I could see the PGs data on an osd, 
> but there was nothing I could do to force the pg to use that osd's copy.
>
> I ended up using the rbd_restore tool to create my rbd on disk and 
> then I reimported it into the pool.
>
> See this thread for info on rbd_restore:
> http://www.spinics.net/lists/ceph-devel/msg11552.html
>
> Of course, you have to copy all of the pieces of the rbd image on one 
> file system somewhere (thank goodness for thin provisioning!) for the 
> tool to work.
>
> There really should be a better way.
>
> Jake
>
> On Monday, May 5, 2014, Jeff Bachtel <jbachtel at bericotechnologies.com 
> <mailto:jbachtel at bericotechnologies.com>> wrote:
>
>     Well, that'd be the ideal solution. Please check out the github
>     gist I posted, though. It seems that despite osd.4 having nothing
>     good for pg 0.2f, the cluster does not acknowledge any other osd
>     has a copy of the pg. I've tried downing osd.4 and manually
>     deleting the pg directory in question with the hope that the
>     cluster would roll back epochs for 0.2f, but all it does is
>     recreate the pg directory (empty) on osd.4.
>
>     Jeff
>
>     On 05/05/2014 04:33 PM, Gregory Farnum wrote:
>
>         What's your cluster look like? I wonder if you can just remove
>         the bad
>         PG from osd.4 and let it recover from the existing osd.1
>         -Greg
>         Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
>         On Sat, May 3, 2014 at 9:17 AM, Jeff Bachtel
>         <jbachtel at bericotechnologies.com> wrote:
>
>             This is all on firefly rc1 on CentOS 6
>
>             I had an osd getting overfull, and misinterpreting
>             directions I downed it
>             then manually removed pg directories from the osd mount.
>             On restart and
>             after a good deal of rebalancing (setting osd weights as I
>             should've
>             originally), I'm now at
>
>                  cluster de10594a-0737-4f34-a926-58dc9254f95f
>                   health HEALTH_WARN 2 pgs backfill; 1 pgs incomplete;
>             1 pgs stuck
>             inactive; 308 pgs stuck unclean; recov
>             ery 1/2420563 objects degraded (0.000%); noout flag(s) set
>                   monmap e7: 3 mons at
>             {controller1=10.100.2.1:6789/0,controller2=10.100.2.2:6789/0,controller3=10.100.2
>             <http://10.100.2.1:6789/0,controller2=10.100.2.2:6789/0,controller3=10.100.2>.
>             3:6789/0}, election epoch 556, quorum 0,1,2
>             controller1,controller2,controller3
>                   mdsmap e268: 1/1/1 up {0=controller1=up:active}
>                   osdmap e3492: 5 osds: 5 up, 5 in
>                          flags noout
>                    pgmap v4167420: 320 pgs, 15 pools, 4811 GB data,
>             1181 kobjects
>                          9770 GB used, 5884 GB / 15654 GB avail
>                          1/2420563 objects degraded (0.000%)
>                                 3 active
>                                12 active+clean
>                                 2 active+remapped+wait_backfill
>                                 1 incomplete
>                               302 active+remapped
>                client io 364 B/s wr, 0 op/s
>
>             # ceph pg dump | grep 0.2f
>             dumped all in format plain
>             0.2f    0       0       0       0       0       0   0
>             incomplete
>             2014-05-03 11:38:01.526832 0'0      3492:23 [4] 4   [4]     4
>             2254'20053      2014-04-28 00:24:36.504086  2100'18109
>             2014-04-26
>             22:26:23.699330
>
>             # ceph pg map 0.2f
>             osdmap e3492 pg 0.2f (0.2f) -> up [4] acting [4]
>
>             The pg query for the downed pg is at
>             https://gist.github.com/jeffb-bt/c8730899ff002070b325
>
>             Of course, the osd I manually mucked with is the only one
>             the cluster is
>             picking up as up/acting. Now, I can query the pg and find
>             epochs where other
>             osds (that I didn't jack up) were acting. And in fact, the
>             latest of those
>             entries (osd.1) has the pg directory in its osd mount, and
>             it's a good
>             healthy 59gb.
>
>             I've tried manually rsync'ing (and preserving attributes)
>             that set of
>             directories from osd.1 to osd.4 without success. Likewise
>             I've tried copying
>             the directories over without attributes set. I've done
>             many, many deep
>             scrubs but the pg query does not show the scrub timestamps
>             being affected.
>
>             I'm seeking ideas for either fixing metadata on the
>             directory on osd.4 to
>             cause this pg to be seen/recognized, or ideas on forcing
>             the cluster's pg
>             map to point to osd.1 for the incomplete pg (basically
>             wiping out the
>             cluster's memory that osd.4 ever had 0.2f). Or any other
>             solution :) It's
>             only 59g, so worst case I'll mark it lost and recreate the
>             pg, but I'd
>             prefer to learn enough of the innards to understand what
>             is going on, and
>             possible means of fixing it.
>
>             Thanks for any help,
>
>             Jeff
>
>             _______________________________________________
>             ceph-users mailing list
>             ceph-users at lists.ceph.com
>             http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users at lists.ceph.com
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140505/ca5ccccf/attachment.htm>