Re: OSD - Slow Requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Wed, 4 May 2016 21:08:02 +0000 Garg, Pankaj wrote:

> Hi,
> 
> I am getting messages like the following from my Ceph systems. Normally
> this would indicate issues with Drives. But when I restart my system,
> different and randomly a couple OSDs again start spitting out the same
> message. SO definitely it's not the same drives every time.
> 
> Any ideas on how to debug this. I don't see any drive related issues in
> dmesg log either.
>

Drives having issues (as in being slow due to errors or firmware bugs)
is a possible reason, but it would be not at the top of my list.

You want to run atop, iostat or the likes and graph actual drive and
various Ceph performance counters to see what is going on and if a
particular drive is slower than the rest or if your whole system is just
reaching the limit of its performance.

Looking at your ceph log output, the first thing that catches the eye is
that all slow objects are for benchmark runs (rados bench), so you seem to
stress testing the cluster and have found its limits...

In addition to that all the slow requests include osd.84, so you might
give that one a closer look. 
But that could of course be a coincidence due to limited log samples.

Christian

> Thanks
> Pankaj
> 
> 
> 
> 2016-05-04 14:02:52.499115 osd.72 [WRN] slow request 30.429347 seconds
> old, received at 2016-05-04 14:02:22.069658:
> osd_op(client.2859198.0:9559 benchmark_data_x86Ceph3_54385_object9558
> [write 0~131072] 309.17ee1e0e ack+ondisk+write+known_if_redirected
> e14815) currently waiting for subops from 84,104 2016-05-04
> 14:02:54.499453 osd.72 [WRN] 24 slow requests, 1 included below; oldest
> blocked for > 52.866778 secs 2016-05-04 14:02:54.499467 osd.72 [WRN]
> slow request 30.660900 seconds old, received at 2016-05-04
> 14:02:23.838455: osd_op(client.2859198.0:9661
> benchmark_data_x86Ceph3_54385_object9660 [write 0~131072] 309.4054960e
> ack+ondisk+write+known_if_redirected e14815) currently waiting for
> subops from 84,104 2016-05-04 14:02:56.499822 osd.72 [WRN] 25 slow
> requests, 1 included below; oldest blocked for > 54.867154 secs
> 2016-05-04 14:02:56.499835 osd.72 [WRN] slow request 30.940457 seconds
> old, received at 2016-05-04 14:02:25.559273:
> osd_op(client.2859197.0:9796 benchmark_data_x86Ceph1_24943_object9795
> [write 0~131072] 308.7e0944a ack+ondisk+write+known_if_redirected
> e14815) currently waiting for subops from 84,97 2016-05-04
> 14:02:59.140562 osd.84 [WRN] 33 slow requests, 1 included below; oldest
> blocked for > 58.267177 secs
> 
> 
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux