Re: pgs incomplete

Alfredo Deza <adeza@xxxxxxxxxx> · Thu, 27 Jun 2019 12:05:49 -0400

On Thu, Jun 27, 2019 at 10:36 AM ☣Adam <adam@xxxxxxxxx> wrote:
Well that caused some excitement (either that or the small power

disruption did)!  One of my OSDs is now down because it keeps crashing

due to a failed assert (stacktraces attached, also I'm apparently

running mimic, not luminous).

In the past a failed assert on an OSD has meant removing the disk,

wiping it, re-adding it as a new one, and then have ceph rebuild it from

other copies of the data.

I did this all manually in the past, but I'm trying to get more familiar

with ceph's commands.  Will the following commands do the same?

ceph-volume lvm zap --destroy --osd-id 11

# Presumably that has to be run from the node with OSD 11, not just

# any ceph node?

# Source: http://docs.ceph.com/docs/mimic/ceph-volume/lvm/zap

That looks correct, and yes, you would need to run on the node with OSD 11.

Do I need to remove the OSD (ceph osd out 11; wait for stabilization;

ceph osd purge 11) before I do this and run and "ceph-deploy osd create"

afterwards?

I think that what you need es essentially the same as the guide for migrating from filestore to bluestore:

http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/

Thanks,

Adam

On 6/26/19 6:35 AM, Paul Emmerich wrote:

> Have you tried: ceph osd force-create-pg <pgid>?

> 

> If that doesn't work: use objectstore-tool on the OSD (while it's not

> running) and use it to force mark the PG as complete. (Don't know the

> exact command off the top of my head)

> 

> Caution: these are obviously really dangerous commands

> 

> 

> 

> Paul

> 

> 

> 

> -- 

> Paul Emmerich

> 

> Looking for help with your Ceph cluster? Contact us at https://croit.io

> 

> croit GmbH

> Freseniusstr. 31h

> 81247 München

> www.croit.io <http://www.croit.io>

> Tel: +49 89 1896585 90

> 

> 

> On Wed, Jun 26, 2019 at 1:56 AM ☣Adam <adam@xxxxxxxxx

> <mailto:adam@xxxxxxxxx>> wrote:

> 

>     How can I tell ceph to give up on "incomplete" PGs?

> 

>     I have 12 pgs which are "inactive, incomplete" that won't recover.  I

>     think this is because in the past I have carelessly pulled disks too

>     quickly without letting the system recover.  I suspect the disks that

>     have the data for these are long gone.

> 

>     Whatever the reason, I want to fix it so I have a clean cluser even if

>     that means losing data.

> 

>     I went through the "troubleshooting pgs" guide[1] which is excellent,

>     but didn't get me to a fix.

> 

>     The output of `ceph pg 2.0 query` includes this:

>         "recovery_state": [

>             {

>                 "name": "Started/Primary/Peering/Incomplete",

>                 "enter_time": "2019-06-25 18:35:20.306634",

>                 "comment": "not enough complete instances of this PG"

>             },

> 

>     I've already restated all OSDs in various orders, and I changed min_size

>     to 1 to see if that would allow them to get fixed, but no such luck.

>     These pools are not erasure coded and I'm using the Luminous release.

> 

>     How can I tell ceph to give up on these PGs?  There's nothing identified

>     as unfound, so mark_unfound_lost doesn't help.  I feel like `ceph osd

>     lost` might be it, but at this point the OSD numbers have been reused

>     for new disks, so I'd really like to limit the damage to the 12 PGs

>     which are incomplete if possible.

> 

>     Thanks,

>     Adam

> 

>     [1]

>     http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/

>     _______________________________________________

>     ceph-users mailing list

>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> 

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com