Re: slow ops at restarting OSDs (octopus)

Peter Lieven <pl@xxxxxxx> · Fri, 11 Jun 2021 12:11:24 +0200

Am 11.06.21 um 11:48 schrieb Dan van der Ster:
> On Fri, Jun 11, 2021 at 11:08 AM Peter Lieven <pl@xxxxxxx> wrote:
>> Am 10.06.21 um 17:45 schrieb Manuel Lausch:
>>> Hi Peter,
>>>
>>> your suggestion pointed me to the right spot.
>>> I didn't know about the feature, that ceph will read from replica
>>> PGs.
>>>
>>> So on. I found two functions in the osd/PrimaryLogPG.cc:
>>> "check_laggy" and "check_laggy_requeue". On both is first a check, if
>>> the partners have the octopus features. if not, the function is
>>> skipped. This explains the beginning of the problem after about the
>>> half cluster was updated.
>>>
>>> To verifiy this, I added "return true" in the first line of the
>>> functions. The issue is gone with it. But
>>> I don't know what problems this could trigger. I know, the root cause
>>> is not fixed with it.
>>> I think I will open a bug ticket with this knowlage.
>>
>> I wonder if I faced the same issue. The issue I had occured when OSDs came back up and peering started.
>>
>> My cluster was a fresh octopus install so I think the min osd release was set to octopus.
>>
>>
>> Is it in general safe to stay with this switch at nautilus and run octopus to run a maintained release?
> I would guess not -- one should follow the upgrade instructions as
> documented. (I was merely confirming that Manuel had indeed followed
> that procedure).
>
> IMHO this issue, as described by Manuel, is not understood.
>
> Manuel could you create a tracker and upload any relevant logs from
> when the slow requests began?

I would be interested if the time is also spent in "waiting for readable" state.

To check this I reduced the slow ops timeout to 1 second and then fetched the info about the

slow ops from the osd daemons that were affected.

Manuel, can you check this as well?

Thanks,

Peter

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx