Re: Hammer: PGs stuck creating

Brad Hubbard <bhubbard@xxxxxxxxxx> · Fri, 1 Jul 2016 09:34:27 +1000

On Thu, Jun 30, 2016 at 11:34 PM, Brian Felton <bjfelton@xxxxxxxxx> wrote:
> Sure.  Here's a complete query dump of one of the 30 pgs:
> http://pastebin.com/NFSYTbUP

Looking at that something immediately stands out.

There are a lot of entries in "past intervals" like so.

"past_intervals": [
     {
         "first": 18522,
         "last": 18523,
         "maybe_went_rw": 1,
         "up": [
             2147483647,
...
"acting": [
    2147483647,
    2147483647,
    2147483647,
    2147483647
],
"primary": -1,
"up_primary": -1

That value is defined in src/crush/crush.h like so;

#define CRUSH_ITEM_NONE   0x7fffffff  /* no result */

So it looks like this could be to do with a bad crush rule (or at least a
previously un-satisfiable rule).

Could you share the output from the following?

$ ceph osd crush rule ls

For each rule listed by the above command.

$ ceph osd crush rule dump [rule_name]

I'd then dump out the crushmap and test it showing any bad mappings with the
commands listed here;

http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon

That should hopefully give some insight.

HTH,
Brad

>
> Brian
>
> On Wed, Jun 29, 2016 at 6:25 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
>>
>> On Thu, Jun 30, 2016 at 3:22 AM, Brian Felton <bjfelton@xxxxxxxxx> wrote:
>> > Greetings,
>> >
>> > I have a lab cluster running Hammer 0.94.6 and being used exclusively
>> > for
>> > object storage.  The cluster consists of four servers running 60 6TB
>> > OSDs
>> > each.  The main .rgw.buckets pool is using k=3 m=1 erasure coding and
>> > contains 8192 placement groups.
>> >
>> > Last week, one of our guys out-ed and removed one OSD from each of three
>> > of
>> > the four servers in the cluster, which resulted in some general badness
>> > (the
>> > disks were wiped post-removal, so the data are gone).  After a proper
>> > education in why this is a Bad Thing, we got the OSDs added back.  When
>> > all
>> > was said and done, we had 30 pgs that were stuck incomplete, and no
>> > amount
>> > of magic has been able to get them to recover.  From reviewing the data,
>> > we
>> > knew that all of these pgs contained at least 2 of the removed OSDs; I
>> > understand and accept that the data are gone, and that's not a concern
>> > (yay
>> > lab).
>> >
>> > Here are the things I've tried:
>> >
>> > - Restarted all OSDs
>> > - Stopped all OSDs, removed all OSDs from the crush map, and started
>> > everything back up
>> > - Executed a 'ceph pg force_create_pg <id>' for each of the 30 stuck pgs
>> > - Executed a 'ceph pg send_pg_creates' to get the ball rolling on
>> > creates
>> > - Executed several 'ceph pg <id> query' commands to ensure we were
>> > referencing valid OSDs after the 'force_create_pg'
>> > - Ensured those OSDs were really removed (e.g. 'ceph auth del', 'ceph
>> > osd
>> > crush remove', and 'ceph osd rm')
>>
>> Can you share some of the pg query output?
>>
>> >
>> > At this point, I've got the same 30 pgs that are stuck creating.  I've
>> > run
>> > out of ideas for getting this back to a healthy state.  In reviewing the
>> > other posts on the mailing list, the overwhelming solution was a bad OSD
>> > in
>> > the crush map, but I'm all but certain that isn't what's hitting us
>> > here.
>> > Normally, being the lab, I'd consider nuking the .rgw.buckets pool and
>> > starting from scratch, but we've recently spent a lot of time pulling
>> > 140TB
>> > of data into this cluster for some performance and recovery tests, and
>> > I'd
>> > prefer not to have to start that process again.  I am willing to
>> > entertain
>> > most any other idea irrespective to how destructive it is to these PGs,
>> > so
>> > long as I don't have to lose the rest of the data in the pool.
>> >
>> > Many thanks in advance for any assistance here.
>> >
>> > Brian Felton
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Cheers,
>> Brad
>
>

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com