Thanks. I should have mentioned that the errors are pretty well distributed across the cluster: ceph1: /var/log/ceph/ceph-osd.0.log 71 ceph1: /var/log/ceph/ceph-osd.1.log 112 ceph1: /var/log/ceph/ceph-osd.2.log 38 ceph2: /var/log/ceph/ceph-osd.3.log 88 ceph2: /var/log/ceph/ceph-osd.4.log 54 ceph3: /var/log/ceph/ceph-osd.5.log 36 ceph3: /var/log/ceph/ceph-osd.6.log 48 ceph3: /var/log/ceph/ceph-osd.7.log 39 ceph3: /var/log/ceph/ceph-osd.8.log 40 ceph4: /var/log/ceph/ceph-osd.10.log 95 ceph4: /var/log/ceph/ceph-osd.9.log 139 ceph5: /var/log/ceph/ceph-osd.11.log 81 ceph5: /var/log/ceph/ceph-osd.12.log 393 I'll try to catch them while they're happening and see what I can learn. Thanks again!! Jeff On Thu, Nov 20, 2014 at 06:40:57AM -0800, Jean-Charles LOPEZ wrote: > Hi Jeff, > > it would probably wise to first check what these slow requests are: > 1) ceph health detail -> This will tell you which OSDs are experiencing the slow requests > 2) ceph daemon osd.{id} dump_ops_in_flight -> To be issued on one of the above OSDs will tell you what theses ops are waiting for. > > My fair guess is that either you have a network problem or some other drives in your cluster are about to die or are experiencing write errors causing retries and slowing the request processing. > > Just to be sure, if your drives are SMART capable, use smartctl to look ate the stats for the drives you will have potentially identified in the steps above. > > Regards > JC > > > > > On Nov 20, 2014, at 06:00, Jeff <jeff@xxxxxxxxxxxxxxxxxxx> wrote: > > > > Hi, > > > > We have a five node cluster that has been running for a long > > time (over a year). A few weeks ago we upgraded to 0.87 (giant) and > > things continued to work well. > > > > Last week a drive failed on one of the nodes. We replaced the > > drive and things were working well again. > > > > After about six days we started getting lots of "slow > > requests...blocked for..." messages (100's/hour) and performance has been > > terrible. Since then we've made sure to have all of the latest OS patches > > and rebooted all five nodes. We are still seeing a lot of slow > > requests/blocked messages. Any idea(s) on what's wrong/where to look? > > > > Thanks! > > Jeff > > -- > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- =============================================================================== Jeff's Used Movie Finder http://www.usedmoviefinder.com email: jeff@xxxxxxxxxxxxxxxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com