All osd slow response / blocked requests upon single disk failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear ceph users,

I am running the following setup:-
- 6 x osd servers (centos 7, mostly HP DL180se G6 with SA P410 controllers)
- Each osd server has 1-2 SSD journals, each handling ~5 7.2k SATA RE disks
- ceph-0.94.10

Normal operations work OK, however when a single disk failed (or
abrupt 'ceph osd down'), all osds other than the ones inside the
downed osd experienced slow response and blocked requests (some more
than others). For example:-

2017-04-24 15:59:58.734235 7f2a62338700  0 log_channel(cluster) log
[WRN] : slow request 30.571582 seconds old, received at 2017-04-24
15:59:28.162572: osd_op(client.11870166.0:118068448
rbd_data.42d93b436c6125.0000000000000577 [sparse-read 8192~4096]
1.a6422b98 ack+read e48964) currently reached_pg
2017-04-24 15:59:58.734241 7f2a62338700  0 log_channel(cluster) log
[WRN] : slow request 30.569605 seconds old, received at 2017-04-24
15:59:28.164550: osd_op(client.11870166.0:118068449
rbd_data.42d93b436c6125.0000000000000577 [sparse-read 40960~8192]
1.a6422b98 ack+read e48964) currently reached_pg
....

In contrast, a normal planned 'ceph osd in' or 'ceph osd out' from a
healthy state work OK and doesn't block requests.

References:-
- ceph osd tree (osd.34 @ osd10 down) : https://pastebin.com/s1AaNJM1
- ceph -s (when healthy): https://pastebin.com/h0NLgbG0
- osd cluster performance during rebuild @ 15:45 - 17:30 :
https://imagebin.ca/v/3KEsK0pGeOR3
- osd cluster i/o wait during rebuild @ 15:45 - 17:30 :
https://imagebin.ca/v/3KErkQ4KC8sv

So far I have tried reducing rebuild priority as follows, but to no avail:-
ceph tell osd.* injectargs '--osd-max-backfills 1'
ceph tell osd.* injectargs '--osd-recovery-max-active 1'
ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
ceph tell osd.* injectargs '--osd-client-op-priority 63'

Is this a case of some slow osd dragging others? Or my setup /
hardware is substandard? Any pointers on what I should look into next,
would be greatly appreciated - thanks.

-- 
--sazli
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux