Hello everybody We have ceph cluster that consist of 8 host with 12 osd per each host. It's 2T SATA disks. [13:23]:[root@se087 ~]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 182.99203 root default -2 182.99203 region RU -3 91.49487 datacenter ru-msk-comp1p -9 22.87500 host 1 48 1.90599 osd.48 up 1.00000 1.00000 49 1.90599 osd.49 up 1.00000 1.00000 50 1.90599 osd.50 up 1.00000 1.00000 51 1.90599 osd.51 up 1.00000 1.00000 52 1.90599 osd.52 up 1.00000 1.00000 53 1.90599 osd.53 up 1.00000 1.00000 54 1.90599 osd.54 up 1.00000 1.00000 55 1.90599 osd.55 up 1.00000 1.00000 56 1.90599 osd.56 up 1.00000 1.00000 57 1.90599 osd.57 up 1.00000 1.00000 58 1.90599 osd.58 up 1.00000 1.00000 59 1.90599 osd.59 up 1.00000 1.00000 -10 22.87216 host 2 60 1.90599 osd.60 up 1.00000 1.00000 61 1.90599 osd.61 up 1.00000 1.00000 62 1.90599 osd.62 up 1.00000 1.00000 63 1.90599 osd.63 up 1.00000 1.00000 64 1.90599 osd.64 up 1.00000 1.00000 65 1.90599 osd.65 up 1.00000 1.00000 66 1.90599 osd.66 up 1.00000 1.00000 67 1.90599 osd.67 up 1.00000 1.00000 69 1.90599 osd.69 up 1.00000 1.00000 70 1.90599 osd.70 up 1.00000 1.00000 71 1.90599 osd.71 up 1.00000 1.00000 68 1.90627 osd.68 up 1.00000 1.00000 -11 22.87500 host 3 72 1.90599 osd.72 up 1.00000 1.00000 73 1.90599 osd.73 up 1.00000 1.00000 74 1.90599 osd.74 up 1.00000 1.00000 75 1.90599 osd.75 up 1.00000 1.00000 76 1.90599 osd.76 up 1.00000 1.00000 77 1.90599 osd.77 up 1.00000 1.00000 78 1.90599 osd.78 up 1.00000 1.00000 79 1.90599 osd.79 up 1.00000 1.00000 80 1.90599 osd.80 up 1.00000 1.00000 81 1.90599 osd.81 up 1.00000 1.00000 82 1.90599 osd.82 up 1.00000 1.00000 83 1.90599 osd.83 up 1.00000 1.00000 -12 22.87271 host 4 84 1.90599 osd.84 up 1.00000 1.00000 86 1.90599 osd.86 up 1.00000 1.00000 89 1.90599 osd.89 up 1.00000 1.00000 90 1.90599 osd.90 up 1.00000 1.00000 91 1.90599 osd.91 up 1.00000 1.00000 92 1.90599 osd.92 up 1.00000 1.00000 93 1.90599 osd.93 up 1.00000 1.00000 94 1.90599 osd.94 up 1.00000 1.00000 95 1.90599 osd.95 up 1.00000 1.00000 85 1.90627 osd.85 up 1.00000 1.00000 88 1.90627 osd.88 up 1.00000 1.00000 87 1.90627 osd.87 up 1.00000 1.00000 -4 91.49716 datacenter ru-msk-vol51 -5 22.87216 host 5 1 1.90599 osd.1 up 1.00000 1.00000 2 1.90599 osd.2 up 1.00000 1.00000 3 1.90599 osd.3 up 1.00000 1.00000 4 1.90599 osd.4 up 1.00000 1.00000 5 1.90599 osd.5 up 1.00000 1.00000 6 1.90599 osd.6 up 1.00000 1.00000 7 1.90599 osd.7 up 1.00000 1.00000 8 1.90599 osd.8 up 1.00000 1.00000 9 1.90599 osd.9 up 1.00000 1.00000 10 1.90599 osd.10 up 1.00000 1.00000 11 1.90599 osd.11 up 1.00000 1.00000 0 1.90627 osd.0 up 1.00000 1.00000 -6 22.87500 host 6 12 1.90599 osd.12 up 1.00000 1.00000 13 1.90599 osd.13 up 1.00000 1.00000 14 1.90599 osd.14 up 1.00000 1.00000 15 1.90599 osd.15 up 1.00000 1.00000 16 1.90599 osd.16 up 1.00000 1.00000 17 1.90599 osd.17 up 1.00000 1.00000 18 1.90599 osd.18 up 1.00000 1.00000 19 1.90599 osd.19 up 1.00000 1.00000 20 1.90599 osd.20 up 1.00000 1.00000 21 1.90599 osd.21 up 1.00000 1.00000 22 1.90599 osd.22 up 1.00000 1.00000 23 1.90599 osd.23 up 1.00000 1.00000 -7 22.87500 host 7 24 1.90599 osd.24 up 1.00000 1.00000 25 1.90599 osd.25 up 1.00000 1.00000 26 1.90599 osd.26 up 1.00000 1.00000 27 1.90599 osd.27 up 1.00000 1.00000 28 1.90599 osd.28 up 1.00000 1.00000 29 1.90599 osd.29 up 1.00000 1.00000 30 1.90599 osd.30 up 1.00000 1.00000 31 1.90599 osd.31 up 1.00000 1.00000 32 1.90599 osd.32 up 1.00000 1.00000 33 1.90599 osd.33 up 1.00000 1.00000 34 1.90599 osd.34 up 1.00000 1.00000 35 1.90599 osd.35 up 1.00000 1.00000 -8 22.87500 host 8 36 1.90599 osd.36 up 1.00000 1.00000 37 1.90599 osd.37 up 1.00000 1.00000 38 1.90599 osd.38 up 1.00000 1.00000 39 1.90599 osd.39 up 1.00000 1.00000 40 1.90599 osd.40 up 1.00000 1.00000 41 1.90599 osd.41 up 1.00000 1.00000 42 1.90599 osd.42 up 1.00000 1.00000 43 1.90599 osd.43 up 1.00000 1.00000 44 1.90599 osd.44 up 1.00000 1.00000 45 1.90599 osd.45 up 1.00000 1.00000 46 1.90599 osd.46 up 1.00000 1.00000 47 1.90599 osd.47 up 1.00000 1.00000 Now we have two problems: 1) Problems with blocked requests in osd ceph -s ..... 18 requests are blocked > 32 sec .... ceph health detail ... 1 ops are blocked > 4194.3 sec 34 ops are blocked > 2097.15 sec 1 ops are blocked > 4194.3 sec on osd.0 34 ops are blocked > 2097.15 sec on osd.0 1 osds have slow requests .... In log osd.0 2015-07-31 14:03:24.490774 7f2cd95c5700 0 log_channel(cluster) log [WRN] : 35 slow requests, 9 included below; oldest blocked for > 3003.952332 secs 2015-07-31 14:03:24.490782 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 960.179599 seconds old, received at 2015-07-31 13:47:24.311080: osd_op(client.67321.0:7856 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [writefull 0~0] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached 2015-07-31 14:03:24.490791 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 960.179357 seconds old, received at 2015-07-31 13:47:24.311323: osd_op(client.67321.0:7857 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [writefull 0~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached 2015-07-31 14:03:24.490794 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 960.167539 seconds old, received at 2015-07-31 13:47:24.323141: osd_op(client.67321.0:7858 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 524288~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached 2015-07-31 14:03:24.490797 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 960.155554 seconds old, received at 2015-07-31 13:47:24.335126: osd_op(client.67321.0:7859 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 1048576~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached 2015-07-31 14:03:24.490801 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 960.145867 seconds old, received at 2015-07-31 13:47:24.344813: osd_op(client.67321.0:7860 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 1572864~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached 2015-07-31 14:03:25.491062 7f2cd95c5700 0 log_channel(cluster) log [WRN] : 35 slow requests, 4 included below; oldest blocked for > 3004.952621 secs 2015-07-31 14:03:25.491078 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 961.140790 seconds old, received at 2015-07-31 13:47:24.350178: osd_op(client.67321.0:7861 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 2097152~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached 2015-07-31 14:03:25.491084 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 961.097870 seconds old, received at 2015-07-31 13:47:24.393098: osd_op(client.67321.0:7862 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 2621440~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached 2015-07-31 14:03:25.491089 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 961.093229 seconds old, received at 2015-07-31 13:47:24.397740: osd_op(client.67321.0:7863 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 3145728~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached 2015-07-31 14:03:25.491095 7f2cd95c5700 0 log_channel(cluster) log [WRN] : slow request 961.002957 seconds old, received at 2015-07-31 13:47:24.488012: osd_op(client.67321.0:7864 default.34169.37__shadow_.AnULxoR-51Q7fGdIVVP92CPeptlQJIm_226 [write 3670016~524288] 26.f9af7c89 ack+ondisk+write+known_if_redirected e9467) currently no flag points reached How I can avoid these blocked requests? What is root cause of this problem? 2) Strange logs in RGW 2015-07-31 13:22:20.204781 7f9c02a5e700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600 2015-07-31 13:22:25.204941 7f9c02a5e700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600 2015-07-31 13:22:30.205093 7f9c02a5e700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600 2015-07-31 13:22:35.205318 7f9c02a5e700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600 2015-07-31 13:22:40.205428 7f9c02a5e700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600 2015-07-31 13:22:45.205567 7f9c02a5e700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600 2015-07-31 13:22:50.205742 7f9c02a5e700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600 2015-07-31 13:22:55.205908 7f9c02a5e700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600 2015-07-31 13:23:00.206063 7f9c02a5e700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600 2015-07-31 13:23:05.206251 7f9c02a5e700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f9b38fa7700' had timed out after 600 Could you help me to solve these problems? -- Best Regards, Stanislav Butkeev _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com