Re: in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

Brad Hubbard <bhubbard@xxxxxxxxxx> · Thu, 17 May 2018 12:10:39 +1000



On Wed, May 16, 2018 at 6:16 PM, Uwe Sauter <uwe.sauter.de@xxxxxxxxx> wrote:
> Hi folks,
>
> I'm currently chewing on an issue regarding "slow requests are blocked". I'd like to identify the OSD that is causing those events
> once the cluster is back to HEALTH_OK (as I have no monitoring yet that would get this info in realtime).
>
> Collecting this information could help identify aging disks if you were able to accumulate and analyze which OSD had blocking
> requests in the past and how often those events occur.
>
> My research so far let's me think that this information is only available as long as the requests are actually blocked. Is this
> correct?

You don't give any indication what version you are running but see
https://tracker.ceph.com/issues/23205

>
> MON logs only show that those events occure and how many requests are in blocking state but no indication of which OSD is
> affected. Is there a way to identify blocking requests from the OSD log files?
>
>
> On a side note: I was trying to write a small Python script that would extract this kind of information in realtime but while I
> was able to register a MonitorLog callback that would receive the same messages as you would get with "ceph -w" I haven's seen in
> the librados Python bindings documentation the possibility to do the equivalent of "ceph health detail". Any suggestions on how to
> get the blocking OSDs via librados?
>
>
> Thanks,
>
>         Uwe
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com