On Wed, May 16, 2018 at 6:16 PM, Uwe Sauter <uwe.sauter.de@xxxxxxxxx> wrote: > Hi folks, > > I'm currently chewing on an issue regarding "slow requests are blocked". I'd like to identify the OSD that is causing those events > once the cluster is back to HEALTH_OK (as I have no monitoring yet that would get this info in realtime). > > Collecting this information could help identify aging disks if you were able to accumulate and analyze which OSD had blocking > requests in the past and how often those events occur. > > My research so far let's me think that this information is only available as long as the requests are actually blocked. Is this > correct? You don't give any indication what version you are running but see https://tracker.ceph.com/issues/23205 > > MON logs only show that those events occure and how many requests are in blocking state but no indication of which OSD is > affected. Is there a way to identify blocking requests from the OSD log files? > > > On a side note: I was trying to write a small Python script that would extract this kind of information in realtime but while I > was able to register a MonitorLog callback that would receive the same messages as you would get with "ceph -w" I haven's seen in > the librados Python bindings documentation the possibility to do the equivalent of "ceph health detail". Any suggestions on how to > get the blocking OSDs via librados? > > > Thanks, > > Uwe > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com