----- Original Message ----- From: "Butkeev Stas" <staerist@xxxxx> To: ceph-users@xxxxxxxx, ceph-community@xxxxxxxxxxxxxx, support@xxxxxxxx Sent: Friday, 31 July, 2015 9:10:40 PM Subject: problem with RGW >Hello everybody > >We have ceph cluster that consist of 8 host with 12 osd per each host. It's 2T SATA disks. >In log osd.0 > >2015-07-31 14:03:24.490774 7f2cd95c5700 0 log_channel(cluster) log [WRN] : 35 slow requests, 9 included below; oldest blocked for > 3003.952332 secs >2015-07-31 14:03:24.490782 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 960.179599 seconds old, received at 2015-07-31 13:47:24.311080: osd_op(client.67321.0:7856 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [writefull 0~0] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached >2015-07-31 14:03:24.490791 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 960.179357 seconds old, received at 2015-07-31 13:47:24.311323: osd_op(client.67321.0:7857 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [writefull 0~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached >2015-07-31 14:03:24.490794 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 960.167539 seconds old, received at 2015-07-31 13:47:24.323141: osd_op(client.67321.0:7858 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 524288~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached >2015-07-31 14:03:24.490797 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 960.155554 seconds old, received at 2015-07-31 13:47:24.335126: osd_op(client.67321.0:7859 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 1048576~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached >2015-07-31 14:03:24.490801 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 960.145867 seconds old, received at 2015-07-31 13:47:24.344813: osd_op(client.67321.0:7860 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 1572864~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached >2015-07-31 14:03:25.491062 7f2cd95c5700 0 log_channel(cluster) log [WRN] : 35 slow requests, 4 included below; oldest blocked for > 3004.952621 secs >2015-07-31 14:03:25.491078 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 961.140790 seconds old, received at 2015-07-31 13:47:24.350178: osd_op(client.67321.0:7861 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 2097152~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached >2015-07-31 14:03:25.491084 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 961.097870 seconds old, received at 2015-07-31 13:47:24.393098: osd_op(client.67321.0:7862 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 2621440~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached >2015-07-31 14:03:25.491089 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 961.093229 seconds old, received at 2015-07-31 13:47:24.397740: osd_op(client.67321.0:7863 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 3145728~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached >2015-07-31 14:03:25.491095 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 961.002957 seconds old, received at 2015-07-31 13:47:24.488012: osd_op(client.67321.0:7864 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 3670016~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached > >How I can avoid these blocked requests? What is root cause of this problem? > Do a "ceph pg dump" and look for the pgs in this state, ack+ondisk+write+known_if_redirected then do a "ceph pg [pgid] query" and post the output here (if there aren't too many, otherwise a representative sample). Also look carefully at the acting OSDs for these pgs and check the output of "ceph daemon /var/run/ceph/ceph-osd.NNN.asok dump_ops_in_flight". There could be problems with these OSDs slowing down the requests, including hardware problems so check thoroughly. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com