Hi Jeff, it would probably wise to first check what these slow requests are: 1) ceph health detail -> This will tell you which OSDs are experiencing the slow requests 2) ceph daemon osd.{id} dump_ops_in_flight -> To be issued on one of the above OSDs will tell you what theses ops are waiting for. My fair guess is that either you have a network problem or some other drives in your cluster are about to die or are experiencing write errors causing retries and slowing the request processing. Just to be sure, if your drives are SMART capable, use smartctl to look ate the stats for the drives you will have potentially identified in the steps above. Regards JC > On Nov 20, 2014, at 06:00, Jeff <jeff@xxxxxxxxxxxxxxxxxxx> wrote: > > Hi, > > We have a five node cluster that has been running for a long > time (over a year). A few weeks ago we upgraded to 0.87 (giant) and > things continued to work well. > > Last week a drive failed on one of the nodes. We replaced the > drive and things were working well again. > > After about six days we started getting lots of "slow > requests...blocked for..." messages (100's/hour) and performance has been > terrible. Since then we've made sure to have all of the latest OS patches > and rebooted all five nodes. We are still seeing a lot of slow > requests/blocked messages. Any idea(s) on what's wrong/where to look? > > Thanks! > Jeff > -- > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com