Re: in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

Brad Hubbard <bhubbard@xxxxxxxxxx> · Thu, 17 May 2018 17:30:26 +1000

On Thu, May 17, 2018 at 4:16 PM, Uwe Sauter <uwe.sauter.de@xxxxxxxxx> wrote:
> Hi,
>
>>> I'm currently chewing on an issue regarding "slow requests are blocked".
>>> I'd like to identify the OSD that is causing those events
>>> once the cluster is back to HEALTH_OK (as I have no monitoring yet that
>>> would get this info in realtime).
>>>
>>> Collecting this information could help identify aging disks if you were
>>> able to accumulate and analyze which OSD had blocking
>>> requests in the past and how often those events occur.
>>>
>>> My research so far let's me think that this information is only available
>>> as long as the requests are actually blocked. Is this
>>> correct?
>>
>>
>> You don't give any indication what version you are running but see
>> https://tracker.ceph.com/issues/23205
>
>
> the cluster is an Proxmox installation which is based on an Ubuntu kernel.
>
> # ceph -v
> ceph version 12.2.5 (dfcb7b53b2e4fcd2a5af0240d4975adc711ab96e) luminous
> (stable)
>
> The mistery is that these blocked requests occur numerously when at least
> one of the 6 servers is booted with kernel 4.15.17, if all are running
> 4.13.16 the number of blocked requests is infrequent and low.

Sounds like you need to profile your two kernel versions and work out
why one is under-performing.

>
>
> Regards,
>
>         Uwe

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com