Re: Slow Request on OSD

Dan Jakubiec <dan.jakubiec@xxxxxxxxx> · Thu, 1 Sep 2016 18:06:11 -0700

Thanks you for all the help Wido:
On Sep 1, 2016, at 14:03, Wido den Hollander <wido@xxxxxxxx> wrote:

You have to mark those OSDs as lost and also force create the incomplete PGs.

This might be the root of our problems.  We didn't mark the parent OSD as "lost" before we removed it.  Now ceph won't let us mark it as lost (and it is no longer in the OSD tree):

djakubiec@dev:~$ ceph osd lost 8 --yes-i-really-mean-it
osd.8 is not down or doesn't exist

djakubiec@dev:~$ ceph osd tree
ID WEIGHT   TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 58.19960 root default
-2  7.27489     host node24
 1  7.27489         osd.1        up  1.00000          1.00000
-3  7.27489     host node25
 2  7.27489         osd.2        up  1.00000          1.00000
-4  7.27489     host node26
 3  7.27489         osd.3        up  1.00000          1.00000
-5  7.27489     host node27
 4  7.27489         osd.4        up  1.00000          1.00000
-6  7.27489     host node28
 5  7.27489         osd.5        up  1.00000          1.00000
-7  7.27489     host node29
 6  7.27489         osd.6        up  1.00000          1.00000
-8  7.27539     host node30
 9  7.27539         osd.9        up  1.00000          1.00000
-9  7.27489     host node31
 7  7.27489         osd.7        up  1.00000          1.00000

BUT, even though OSD 8 no longer exists I see still lots of references to OSD 8 in various dumps and query's.

Interestingly do still see weird entries in the CRUSH map (should I do something about these?):

# devices
device 0 device0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 device8
device 9 osd.9

I then tried on all 80 incomplete PGs:

	ceph pg force_create_pg <pgid>

The 80 PGs moved to "creating" for a few minutes but then all went back to "incomplete".

Is there some way to force individual PGs to be marked as "lost"?

Thanks!

-- Dan

But I think you have lost so many objects that the cluster is beyond a point of repair honestly.

Wido

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com