Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

Manuel Lausch <manuel.lausch@xxxxxxxx> · Wed, 10 Nov 2021 08:57:14 +0000

Hi Sage,

thank you for your help.

My origin issue with slow ops on osd restarts are gone too. Even with default values for paxos_proposal_interval.

Its a bit annoying, that I spent many hours to debug this and finally I missed only one step in the upgrade.

Only during the update itself, until require_osd_release is set to the new version, there will be interruptions

Regards
Manuel

________________________________
Von: Sage Weil <sage@xxxxxxxxxxxx>
Gesendet: Dienstag, 9. November 2021 17:29
An: Manuel Lausch
Betreff: Re:  Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

Yeah, I think that is the problem.  The field that is getting updated by prepare_beacon is new in octopus, so if your osdmap still has require_osd_release=nautlius then it is trying to set it but then not getting encoded (for compatibility).  Doing `ceph osd require_osd_release octopus` should resolve this.

On Tue, Nov 9, 2021 at 9:01 AM Sage Weil <sage@xxxxxxxxxxxx<mailto:sage@xxxxxxxxxxxx>> wrote:
What version are you running?  I thought it was pacific or octopus but the osdmap says "require_osd_release": "nautilus" which implies the upgrade procedure wasn't finished?

sage

On Tue, Nov 9, 2021 at 8:08 AM Manuel Lausch <manuel.lausch@xxxxxxxx<mailto:manuel.lausch@xxxxxxxx>> wrote:
As far as I see, the maps differ only in the epoch and creation date.
Nothing else. I dumped some maps and uploaded it for you:
1f1e1e5e-1c1c-470b-b691-ed820687bab8

On This cluster I don't create snapshots regularly. Since some weeks,
there are no snapshots present.

please let me know, if you need further information.

Regards
Manuel

On Tue, 9 Nov 2021 07:40:29 -0600
Sage Weil <sage@xxxxxxxxxxxx<mailto:sage@xxxxxxxxxxxx>> wrote:

> Are you sure consecutive maps are identical?  Can you get the latest
> epoch ('ceph osd stat'), and then dump a few consecutive ones?  e.g.
>
> ceph osd dump 1000 -f json-pretty  > 1000
> ceph osd dump 1001 -f json-pretty  > 1001
> ceph osd dump 1002 -f json-pretty  > 1002
> ceph osd dump 1003 -f json-pretty  > 1003
>
> ...and ceph-post-file those?  Based on the logs I think the delta is
> related to snap trimming, but want to confirm.  Thanks!
>
> Thanks!
> sage
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx