Hello, On Wed, 4 May 2016 21:08:02 +0000 Garg, Pankaj wrote: > Hi, > > I am getting messages like the following from my Ceph systems. Normally > this would indicate issues with Drives. But when I restart my system, > different and randomly a couple OSDs again start spitting out the same > message. SO definitely it's not the same drives every time. > > Any ideas on how to debug this. I don't see any drive related issues in > dmesg log either. > Drives having issues (as in being slow due to errors or firmware bugs) is a possible reason, but it would be not at the top of my list. You want to run atop, iostat or the likes and graph actual drive and various Ceph performance counters to see what is going on and if a particular drive is slower than the rest or if your whole system is just reaching the limit of its performance. Looking at your ceph log output, the first thing that catches the eye is that all slow objects are for benchmark runs (rados bench), so you seem to stress testing the cluster and have found its limits... In addition to that all the slow requests include osd.84, so you might give that one a closer look. But that could of course be a coincidence due to limited log samples. Christian > Thanks > Pankaj > > > > 2016-05-04 14:02:52.499115 osd.72 [WRN] slow request 30.429347 seconds > old, received at 2016-05-04 14:02:22.069658: > osd_op(client.2859198.0:9559 benchmark_data_x86Ceph3_54385_object9558 > [write 0~131072] 309.17ee1e0e ack+ondisk+write+known_if_redirected > e14815) currently waiting for subops from 84,104 2016-05-04 > 14:02:54.499453 osd.72 [WRN] 24 slow requests, 1 included below; oldest > blocked for > 52.866778 secs 2016-05-04 14:02:54.499467 osd.72 [WRN] > slow request 30.660900 seconds old, received at 2016-05-04 > 14:02:23.838455: osd_op(client.2859198.0:9661 > benchmark_data_x86Ceph3_54385_object9660 [write 0~131072] 309.4054960e > ack+ondisk+write+known_if_redirected e14815) currently waiting for > subops from 84,104 2016-05-04 14:02:56.499822 osd.72 [WRN] 25 slow > requests, 1 included below; oldest blocked for > 54.867154 secs > 2016-05-04 14:02:56.499835 osd.72 [WRN] slow request 30.940457 seconds > old, received at 2016-05-04 14:02:25.559273: > osd_op(client.2859197.0:9796 benchmark_data_x86Ceph1_24943_object9795 > [write 0~131072] 308.7e0944a ack+ondisk+write+known_if_redirected > e14815) currently waiting for subops from 84,97 2016-05-04 > 14:02:59.140562 osd.84 [WRN] 33 slow requests, 1 included below; oldest > blocked for > 58.267177 secs > > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com