Re: PG stuck peering after host reboot

<george.vasilakakos@xxxxxxxxxx> · Fri, 17 Feb 2017 10:09:22 +0000

Hi Wido,

In an effort to get the cluster to complete peering that PG (as we need to be able to use our pool) we have removed osd.595 from the CRUSH map to allow a new mapping to occur.

When I left the office yesterday osd.307 had replaced osd.595 in the up set but the acting set had CRUSH_ITEM_NONE in place of the primary. The PG was in a remapped+peering state and recovery was taking place for the other PGs that lived on that OSD.
Worth noting that osd.307 in on the same host as osd.595.

We’ll have a look on osd.595 like you suggested.

On 17/02/2017, 06:48, "Wido den Hollander" <wido@xxxxxxxx> wrote:

>
>> Op 16 februari 2017 om 14:55 schreef george.vasilakakos@xxxxxxxxxx:
>> 
>> 
>> Hi folks,
>> 
>> I have just made a tracker for this issue: http://tracker.ceph.com/issues/18960
>> I used ceph-post-file to upload some logs from the primary OSD for the troubled PG.
>> 
>> Any help would be appreciated.
>> 
>> If we can't get it to peer, we'd like to at least get it unstuck, even if it means data loss.
>> 
>> What's the proper way to go about doing that?
>
>Can you try this:
>
>1. Go to the host
>2. Stop OSD 595
>3. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-595 --op info --pgid 1.323
>
>What does osd.595 think about that PG?
>
>You could even try 'rm-past-intervals' with the object-store tool, but that might be a bit dangerous. Wouldn't do that immediately.
>
>Wido
>
>> 
>> Best regards,
>> 
>> George
>> ________________________________________
>> From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of george.vasilakakos@xxxxxxxxxx [george.vasilakakos@xxxxxxxxxx]
>> Sent: 14 February 2017 10:27
>> To: bhubbard@xxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  PG stuck peering after host reboot
>> 
>> Hi Brad,
>> 
>> I'll be doing so later in the day.
>> 
>> Thanks,
>> 
>> George
>> ________________________________________
>> From: Brad Hubbard [bhubbard@xxxxxxxxxx]
>> Sent: 13 February 2017 22:03
>> To: Vasilakakos, George (STFC,RAL,SC); Ceph Users
>> Subject: Re:  PG stuck peering after host reboot
>> 
>> I'd suggest creating a tracker and uploading a full debug log from the
>> primary so we can look at this in more detail.
>> 
>> On Mon, Feb 13, 2017 at 9:11 PM,  <george.vasilakakos@xxxxxxxxxx> wrote:
>> > Hi Brad,
>> >
>> > I could not tell you that as `ceph pg 1.323 query` never completes, it just hangs there.
>> >
>> > On 11/02/2017, 00:40, "Brad Hubbard" <bhubbard@xxxxxxxxxx> wrote:
>> >
>> >     On Thu, Feb 9, 2017 at 3:36 AM,  <george.vasilakakos@xxxxxxxxxx> wrote:
>> >     > Hi Corentin,
>> >     >
>> >     > I've tried that, the primary hangs when trying to injectargs so I set the option in the config file and restarted all OSDs in the PG, it came up with:
>> >     >
>> >     > pg 1.323 is remapped+peering, acting [595,1391,2147483647,127,937,362,267,320,7,634,716]
>> >     >
>> >     > Still can't query the PG, no error messages in the logs of osd.240.
>> >     > The logs on osd.595 and osd.7 still fill up with the same messages.
>> >
>> >     So what does "peering_blocked_by_detail" show in that case since it
>> >     can no longer show "peering_blocked_by_history_les_bound"?
>> >
>> >     >
>> >     > Regards,
>> >     >
>> >     > George
>> >     > ________________________________
>> >     > From: Corentin Bonneton [list@xxxxxxxx]
>> >     > Sent: 08 February 2017 16:31
>> >     > To: Vasilakakos, George (STFC,RAL,SC)
>> >     > Cc: ceph-users@xxxxxxxxxxxxxx
>> >     > Subject: Re:  PG stuck peering after host reboot
>> >     >
>> >     > Hello,
>> >     >
>> >     > I already had the case, I applied the parameter (osd_find_best_info_ignore_history_les) to all the osd that have reported the queries blocked.
>> >     >
>> >     > --
>> >     > Cordialement,
>> >     > CEO FEELB | Corentin BONNETON
>> >     > contact@xxxxxxxx<mailto:contact@xxxxxxxx>
>> >     >
>> >     > Le 8 févr. 2017 à 17:17, george.vasilakakos@xxxxxxxxxx<mailto:george.vasilakakos@xxxxxxxxxx> a écrit :
>> >     >
>> >     > Hi Ceph folks,
>> >     >
>> >     > I have a cluster running Jewel 10.2.5 using a mix EC and replicated pools.
>> >     >
>> >     > After rebooting a host last night, one PG refuses to complete peering
>> >     >
>> >     > pg 1.323 is stuck inactive for 73352.498493, current state peering, last acting [595,1391,240,127,937,362,267,320,7,634,716]
>> >     >
>> >     > Restarting OSDs or hosts does nothing to help, or sometimes results in things like this:
>> >     >
>> >     > pg 1.323 is remapped+peering, acting [2147483647,1391,240,127,937,362,267,320,7,634,716]
>> >     >
>> >     >
>> >     > The host that was rebooted is home to osd.7 (8). If I go onto it to look at the logs for osd.7 this is what I see:
>> >     >
>> >     > $ tail -f /var/log/ceph/ceph-osd.7.log
>> >     > 2017-02-08 15:41:00.445247 7f5fcc2bd700  0 -- XXX.XXX.XXX.172:6905/20510 >> XXX.XXX.XXX.192:6921/55371 pipe(0x7f6074a0b400 sd=34 :42828 s=2 pgs=319 cs=471 l=0 c=0x7f6070086700).fault, initiating reconnect
>> >     >
>> >     > I'm assuming that in IP1:port1/PID1 >> IP2:port2/PID2 the >> indicates the direction of communication. I've traced these to osd.7 (rank 8 in the stuck PG) reaching out to osd.595 (the primary in the stuck PG).
>> >     >
>> >     > Meanwhile, looking at the logs of osd.595 I see this:
>> >     >
>> >     > $ tail -f /var/log/ceph/ceph-osd.595.log
>> >     > 2017-02-08 15:41:15.760708 7f1765673700  0 -- XXX.XXX.XXX.192:6921/55371 >> XXX.XXX.XXX.172:6905/20510 pipe(0x7f17b2911400 sd=101 :6921 s=0 pgs=0 cs=0 l=0 c=0x7f17b7beaf00).accept connect_seq 478 vs existing 477 state standby
>> >     > 2017-02-08 15:41:20.768844 7f1765673700  0 bad crc in front 1941070384 != exp 3786596716
>> >     >
>> >     > which again shows osd.595 reaching out to osd.7 and from what I could gather the CRC problem is about messaging.
>> >     >
>> >     > Google searching has yielded nothing particularly useful on how to get this unstuck.
>> >     >
>> >     > ceph pg 1.323 query seems to hang forever but it completed once last night and I noticed this:
>> >     >
>> >     >            "peering_blocked_by_detail": [
>> >     >                {
>> >     >                    "detail": "peering_blocked_by_history_les_bound"
>> >     >                }
>> >     >
>> >     > We have seen this before and it was cleared by setting osd_find_best_info_ignore_history_les to true for the first two OSDs on the stuck PGs (this was on a 3 replica pool). This hasn't worked in this case and I suspect the option needs to be set on either a majority of OSDs or enough k number of OSDs to be able to use their data and ignore history.
>> >     >
>> >     > We would really appreciate any guidance and/or help the community can offer!
>> >     >
>> >     > _______________________________________________
>> >     > ceph-users mailing list
>> >     > ceph-users@xxxxxxxxxxxxxx
>> >     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> >
>> >     --
>> >     Cheers,
>> >     Brad
>> >
>> >
>> 
>> 
>> 
>> --
>> Cheers,
>> Brad
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com