Re: slow requests/blocked

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks.  I should have mentioned that the errors are pretty well
distributed across the cluster:

ceph1: /var/log/ceph/ceph-osd.0.log       71
ceph1: /var/log/ceph/ceph-osd.1.log      112
ceph1: /var/log/ceph/ceph-osd.2.log       38
ceph2: /var/log/ceph/ceph-osd.3.log       88
ceph2: /var/log/ceph/ceph-osd.4.log       54
ceph3: /var/log/ceph/ceph-osd.5.log       36
ceph3: /var/log/ceph/ceph-osd.6.log       48
ceph3: /var/log/ceph/ceph-osd.7.log       39
ceph3: /var/log/ceph/ceph-osd.8.log       40
ceph4: /var/log/ceph/ceph-osd.10.log      95
ceph4: /var/log/ceph/ceph-osd.9.log      139
ceph5: /var/log/ceph/ceph-osd.11.log      81
ceph5: /var/log/ceph/ceph-osd.12.log     393

I'll try to catch them while they're happening and see what I can
learn.

Thanks again!!

Jeff


On Thu, Nov 20, 2014 at 06:40:57AM -0800, Jean-Charles LOPEZ wrote:
> Hi Jeff,
> 
> it would probably wise to first check what these slow requests are:
> 1) ceph health detail -> This will tell you which OSDs are experiencing the slow requests
> 2) ceph daemon osd.{id} dump_ops_in_flight -> To be issued on one of the above OSDs will tell you what theses ops are waiting for.
> 
> My fair guess is that either you have a network problem or some other drives in your cluster are about to die or are experiencing write errors causing retries and slowing the request processing.
> 
> Just to be sure, if your drives are SMART capable, use smartctl to look ate the stats for the drives you will have potentially identified in the steps above.
> 
> Regards
> JC
> 
> 
> 
> > On Nov 20, 2014, at 06:00, Jeff <jeff@xxxxxxxxxxxxxxxxxxx> wrote:
> > 
> > Hi,
> > 
> > 	We have a five node cluster that has been running for a long
> > time (over a year).  A few weeks ago we upgraded to 0.87 (giant) and 
> > things continued to work well.  
> > 
> > 	Last week a drive failed on one of the nodes.  We replaced the
> > drive and things were working well again.
> > 
> > 	After about six days we started getting lots of "slow
> > requests...blocked for..." messages (100's/hour) and performance has been
> > terrible.  Since then we've made sure to have all of the latest OS patches
> > and rebooted all five nodes.  We are still seeing a lot of slow
> > requests/blocked messages.  Any idea(s) on what's wrong/where to look?
> > 
> > Thanks!
> > 	Jeff
> > -- 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
===============================================================================
                        Jeff's Used Movie Finder    
                     http://www.usedmoviefinder.com
                    email: jeff@xxxxxxxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux