Re: REQUEST_SLOW across many OSDs at the same time

"mart.v" <mart.v@xxxxxxxxx> · Mon, 01 Apr 2019 12:42:43 +0200 (CEST)

Thanks for this advice. It helped me to identify a subset of devices (only 3 of the whole cluster) where was this problem happening. The SAS adapter (LSI SAS 3008) on my Supermicro board was the issue. There is a RAID mode enabled by default. I have flashed the latest firmware (v16) and switched to IT mode (no raid).

Issues with slow requests immediately ceased. I hope it will help someone else with the same issue :-)

Best, 
Martin

I am afraid I was not clear enough. Suppose that ceph health detail reports a slow request involving osd.14

In osd.14 log I see this line:

2019-02-24 16:58:39.475740 7fe25a84d700  0 log_channel(cluster) log [WRN] : slow request 30.328572 seconds old, received at 2019-02-24 16:58:09.147037: osd_op(client.148580771.0:476351313 8.1d6 8:6ba6a916:::rbd_data.ba32e7238e1f29.00000000000004b3:head [set-alloc-hint object_size 4194304 write_size 4194304,write 3776512~4096] snapc 0=[] ondisk+write+known_if_redirected e1242718) currently op_applied

Here the pg_num is 8.1d6

# ceph pg map 8.1d6
osdmap e1247126 pg 8.1d6 (8.1d6) -> up [14,38,24] acting [14,38,24]
[root@ceph-osd-02 ceph]# ceph pg map 8.1d6

So the problem is not necessarily is osd.14. It could also in osd.38 or osd.24, or in the relevant hosts

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com