Re: How to fix a Ceph PG in unkown state with no OSDs?

Paul Emmerich <paul.emmerich@xxxxxxxx> · Thu, 14 Jun 2018 22:46:27 +0200

Can you post your whole crushmap?

ceph osd getcrushmap -o crushmap
crushtool -d crushmap -o crushmap.txt

Paul

2018-06-14 22:39 GMT+02:00 Oliver Schulz <oliver.schulz@xxxxxxxxxxxxxx>:
Thanks, Greg!!

I reset all the OSD weights to 1.00, and I think I'm in a much

better state now. The only trouble left in "ceph health detail" is

PG_DEGRADED Degraded data redundancy: 4/404985012 objects degraded (0.000%), 3 pgs degraded

    pg 2.47 is active+recovery_wait+degraded+remapped, acting [177,68,187]

    pg 2.1fd is active+recovery_wait+degraded+remapped, acting [36,83,185]

    pg 2.748 is active+recovery_wait+degraded, acting [31,8,149]

(There's a lot of misplaced PGs now, obviously). The interesting

thing is that my "lost" PG is back, too, with three acting OSDs.

Maybe I dodged the bullet - what do you think?

One question: Is there a way to give recovery of the three

degraded PGs priority over backfilling the misplaced ones?

I tried "ceph pg force-recovery" but it didn't seem to have

any effect, they were still on "recovery_wait", after.

Cheers,

Oliver

On 14.06.2018 22:09, Gregory Farnum wrote:

On Thu, Jun 14, 2018 at 4:07 PM Oliver Schulz <oliver.schulz@xxxxxxxxxxxxxx <mailto:oliver.schulz@tu-dortmund.de>> wrote:

    Hi Greg,

    I increased the hard limit and rebooted everything. The

    PG without acting OSDs still has none, but I also have

    quite a few PGs with that look like this, now:

          pg 1.79c is stuck undersized for 470.640254, current state

    active+undersized+degraded, last acting [179,154]

    I had that problem before (only two acting OSDs on a few PGs),

    I always solved it by setting the primary OSD to out and then

    back in a few seconds later (resulting in a very quick recovery,

    then all was fine again). But maybe that's not the ideal solution?

    Here's "ceph pg map" for one of them:

          osdmap e526060 pg 1.79c (1.79c) -> up [179,154] acting [179,154]

    I also have two PG's that have only one acting OSD, now:

          osdmap e526060 pg 0.58a (0.58a) -> up [174] acting [174]

          osdmap e526060 pg 2.139 (2.139) -> up [61] acting [61]

    How can I make Ceph assign three OSD's to all of these weird PGs?

    Before the reboot, they all did have three OSDs assigned (except for

    the one that has none), and they were not shown as degraded.

      > If it's the second, then fixing the remapping problem will

    resolve it.

      > That's probably/hopefully just by undoing the remap-by-utilization

      > changes.

    How do I do that, best? Just set all the weights back to 1.00?

Yeah. This is probably the best way to fix up the other undersized PGs — at least, assuming it doesn't result in an over-full PG!

I don't work with overflowing OSDs/clusters often, but my suspicion is you're better off with something like CERN's reweight scripts than using reweight-by-utilization. Unless it's improved without my noticing, that algorithm just isn't very good. :/

-Greg

    Cheers,

    Oliver

    P.S.: Thanks so much for helping!

    On 14.06.2018 21:37, Gregory Farnum wrote:

     > On Thu, Jun 14, 2018 at 3:26 PM Oliver Schulz

     > <oliver.schulz@xxxxxxxxxxxxxx

    <mailto:oliver.schulz@tu-dortmund.de>

    <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>> wrote:

     >

     >     But the contents of the remapped PGs should still be

     >     Ok, right? What confuses me is that they don't

     >     backfill - why don't the "move" where they belong?

     >

     >     As for the PG hard limit, yes, I ran into this. Our

     >     cluster had been very (very) full, but I wanted the

     >     new OSD nodes to use bluestore, so I updated to

     >     Luminous before I added the additional storage. I

     >     temporarily increased the pg hard limit and after

     >     a while (and after adding the new OSDs) the cluster

     >     seemed to be in a decent state again. Afterwards,

     >     I set the PG hard limit back to normal.

     >

     >     I don't have a "too many PGs per OSD" health warning,

     >     currently - should I still increase the PG hard limit?

     >

     >

     > Well, it's either the hard limit getting hit, or the fact that

    the PG

     > isn't getting mapped to any OSD and there not being an existing

    primary

     > to take responsibility for remapping it.

     >

     > If it's the second, then fixing the remapping problem will

    resolve it.

     > That's probably/hopefully just by undoing the

    remap-by-utilization changes.

     >

     >

     >     On 14.06.2018 20:58, Gregory Farnum wrote:

     >      > Okay, I can’t tell you what happened to that one pg, but

    you’ve got

     >      > another 445 remapped pgs and that’s not a good state to be

    in. It

     >     was

     >      > probably your use of the rewritten-by-utilization. :/ I am

    pretty

     >     sure

     >      > the missing PG and remapped ones have the same root cause,

    and it’s

     >      > possible but by no means certain fixing one will fix the

    others.

     >      >

     >      >

     >      > ...oh, actually, the most likely cause just came up in an

    unrelated

     >      > conversation. You’ve probably run into the pg overdose

    protection

     >     that

     >      > was added in luminous. Check the list archives for the exact

     >     name, but

     >      > you’ll want to increase the pg hard limit and restart the

    osds that

     >      > exceeded the previous/current setting.

     >      > -Greg

     >      > On Thu, Jun 14, 2018 at 2:33 PM Oliver Schulz

     >      > <oliver.schulz@xxxxxxxxxxxxxx

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>>> wrote:

     >      >

     >      >     I'm not running the balancer, but I did

    reweight-by-utilization

     >      >     a few times recently.

     >      >

     >      >     "ceph osd tree" and "ceph -s" say:

     >      >

     >      >

    https://gist.github.com/oschulz/36d92af84851ec42e09ce1f3cacbc110

     >      >

     >      >

     >      >

     >      >     On 14.06.2018 20:23, Gregory Farnum wrote:

     >      >      > Well, if this pg maps to no osds, something has

    certainly

     >     gone wrong

     >      >      > with your crush map. What’s the crush rule it’s

    using, and

     >     what’s

     >      >     the

     >      >      > output of “ceph osd tree”?

     >      >      > Are you running the manager’s balancer module or

    something

     >     that

     >      >     might be

     >      >      > putting explicit mappings into the osd map and

    broken it?

     >      >      >

     >      >      > I’m not certain off-hand about the pg reporting, but I

     >     believe if

     >      >     it’s

     >      >      > reporting the state as unknown that means *no*

    running osd

     >     which

     >      >      > contains any copy of that pg. That’s not something

    which ceph

     >      >     could do

     >      >      > on its own without failures of osds. What’s the

    output of

     >     “ceph -s”?

     >      >      > On Thu, Jun 14, 2018 at 2:15 PM Oliver Schulz

     >      >      > <oliver.schulz@xxxxxxxxxxxxxx

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>

     >      >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>>

     >      >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>

     >      >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>>>> wrote:

     >      >      >

     >      >      >     Dear Greg,

     >      >      >

     >      >      >     no, it's a very old cluster (continuous operation

     >     since 2013,

     >      >      >     with multiple extensions). It's a production

    cluster and

     >      >      >     there's about 300TB of valuable data on it.

     >      >      >

     >      >      >     We recently updated to luminous and added more

    OSDs (a

     >     month

     >      >      >     ago or so), but everything seemed Ok since then. We

     >     didn't have

     >      >      >     any disk failures, but we had trouble with the

    MDS daemons

     >      >      >     in the last days, so there were a few reboots.

     >      >      >

     >      >      >     Is it somehow possible to find this "lost" PG

    again? Since

     >      >      >     it's in the metadata pool, large parts of our

    CephFS

     >     directory

     >      >      >     tree are currently unavailable. I turned the MDS

     >     daemons off

     >      >      >     for now ...

     >      >      >

     >      >      >

     >      >      >     Cheers

     >      >      >

     >      >      >     Oliver

     >      >      >

     >      >      >     On 14.06.2018 19:59, Gregory Farnum wrote:

     >      >      >      > Is this a new cluster? Or did the crush map

    change

     >     somehow

     >      >      >     recently? One

     >      >      >      > way this might happen is if CRUSH just failed

     >     entirely to

     >      >     map a pg,

     >      >      >      > although I think if the pg exists anywhere it

     >     should still be

     >      >      >     getting

     >      >      >      > reported as inactive.

     >      >      >      > On Thu, Jun 14, 2018 at 8:40 AM Oliver Schulz

     >      >      >      > <oliver.schulz@xxxxxxxxxxxxxx

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>

     >      >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>>

     >      >      >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>

     >      >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>>>

     >      >      >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>

     >      >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>>

     >      >      >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>

     >      >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>

     >     <mailto:oliver.schulz@tu-dortmund.de

    <mailto:oliver.schulz@tu-dortmund.de>>>>>> wrote:

     >      >      >      >

     >      >      >      >     Dear all,

     >      >      >      >

     >      >      >      >     I have a serious problem with our Ceph

    cluster:

     >     One of our

     >      >      >     PGs somehow

     >      >      >      >     ended up in this state (reported by "ceph

     >     health detail":

     >      >      >      >

     >      >      >      >           pg 1.XXX is stuck inactive for

    ..., current

     >      >     state unknown,

     >      >      >      >     last acting []

     >      >      >      >

     >      >      >      >     Also, "ceph pg map 1.xxx" reports:

     >      >      >      >

     >      >      >      >           osdmap e525812 pg 1.721 (1.721) ->

    up []

     >     acting []

     >      >      >      >

     >      >      >      >     I can't use "ceph pg 1.XXX query", it just

     >     hangs with

     >      >     no output.

     >      >      >      >

     >      >      >      >     All OSDs are up and in, I have MON

    quorum, all

     >     other

     >      >     PGs seem

     >      >      >     to be

     >      >      >      >     fine.

     >      >      >      >

     >      >      >      >     How can diagnose/fix this?

    Unfortunately, the PG in

     >      >     question

     >      >      >     is part

     >      >      >      >     of the CephFS metadata pool ...

     >      >      >      >

     >      >      >      >     Any help would be very, very much

    appreciated!

     >      >      >      >

     >      >      >      >

     >      >      >      >     Cheers,

     >      >      >      >

     >      >      >      >     Oliver

     >      >      >      >         _______________________________________________

     >      >      >      >     ceph-users mailing list

     >      >      >      > ceph-users@xxxxxxxxxxxxxx

    <mailto:ceph-users@xxxxxxxxxx.com>

     >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>>

     >      >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>

     >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>>>

     >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com> <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>>

     >      >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>

     >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>>>>

     >      >      >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>

     >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>>

     >      >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>

     >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>>>

     >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com> <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>>

     >      >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>

     >     <mailto:ceph-users@xxxxxxxxxx.com

    <mailto:ceph-users@xxxxxxxxxx.com>>>>>

     >      >      >      >

    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

     >      >      >      >

     >      >      >

     >      >

     >

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com