Dear ceph users, I am running the following setup:- - 6 x osd servers (centos 7, mostly HP DL180se G6 with SA P410 controllers) - Each osd server has 1-2 SSD journals, each handling ~5 7.2k SATA RE disks - ceph-0.94.10 Normal operations work OK, however when a single disk failed (or abrupt 'ceph osd down'), all osds other than the ones inside the downed osd experienced slow response and blocked requests (some more than others). For example:- 2017-04-24 15:59:58.734235 7f2a62338700 0 log_channel(cluster) log [WRN] : slow request 30.571582 seconds old, received at 2017-04-24 15:59:28.162572: osd_op(client.11870166.0:118068448 rbd_data.42d93b436c6125.0000000000000577 [sparse-read 8192~4096] 1.a6422b98 ack+read e48964) currently reached_pg 2017-04-24 15:59:58.734241 7f2a62338700 0 log_channel(cluster) log [WRN] : slow request 30.569605 seconds old, received at 2017-04-24 15:59:28.164550: osd_op(client.11870166.0:118068449 rbd_data.42d93b436c6125.0000000000000577 [sparse-read 40960~8192] 1.a6422b98 ack+read e48964) currently reached_pg .... In contrast, a normal planned 'ceph osd in' or 'ceph osd out' from a healthy state work OK and doesn't block requests. References:- - ceph osd tree (osd.34 @ osd10 down) : https://pastebin.com/s1AaNJM1 - ceph -s (when healthy): https://pastebin.com/h0NLgbG0 - osd cluster performance during rebuild @ 15:45 - 17:30 : https://imagebin.ca/v/3KEsK0pGeOR3 - osd cluster i/o wait during rebuild @ 15:45 - 17:30 : https://imagebin.ca/v/3KErkQ4KC8sv So far I have tried reducing rebuild priority as follows, but to no avail:- ceph tell osd.* injectargs '--osd-max-backfills 1' ceph tell osd.* injectargs '--osd-recovery-max-active 1' ceph tell osd.* injectargs '--osd-recovery-op-priority 1' ceph tell osd.* injectargs '--osd-client-op-priority 63' Is this a case of some slow osd dragging others? Or my setup / hardware is substandard? Any pointers on what I should look into next, would be greatly appreciated - thanks. -- --sazli _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com