On Thu, May 17, 2018 at 4:16 PM, Uwe Sauter <uwe.sauter.de@xxxxxxxxx> wrote: > Hi, > >>> I'm currently chewing on an issue regarding "slow requests are blocked". >>> I'd like to identify the OSD that is causing those events >>> once the cluster is back to HEALTH_OK (as I have no monitoring yet that >>> would get this info in realtime). >>> >>> Collecting this information could help identify aging disks if you were >>> able to accumulate and analyze which OSD had blocking >>> requests in the past and how often those events occur. >>> >>> My research so far let's me think that this information is only available >>> as long as the requests are actually blocked. Is this >>> correct? >> >> >> You don't give any indication what version you are running but see >> https://tracker.ceph.com/issues/23205 > > > the cluster is an Proxmox installation which is based on an Ubuntu kernel. > > # ceph -v > ceph version 12.2.5 (dfcb7b53b2e4fcd2a5af0240d4975adc711ab96e) luminous > (stable) > > The mistery is that these blocked requests occur numerously when at least > one of the 6 servers is booted with kernel 4.15.17, if all are running > 4.13.16 the number of blocked requests is infrequent and low. Sounds like you need to profile your two kernel versions and work out why one is under-performing. > > > Regards, > > Uwe -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com