Thanks for this advice. It helped me to identify a subset of devices (only 3 of the whole cluster) where was this problem happening. The SAS adapter (LSI SAS 3008) on my Supermicro board was the issue. There is a RAID mode enabled by default. I have flashed the latest firmware (v16) and switched to IT mode (no raid).
Issues with slow requests immediately ceased. I hope it will help someone else with the same issue :-)
Best,
Martin
I am afraid I was not clear enough. Suppose that ceph health detail reports a slow request involving osd.14In osd.14 log I see this line:2019-02-24 16:58:39.475740 7fe25a84d700 0 log_channel(cluster) log [WRN] : slow request 30.328572 seconds old, received at 2019-02-24 16:58:09.147037: osd_op(client.148580771.0:476351313 8.1d6 8:6ba6a916:::rbd_data.ba32e7238e1f29.00000000000004b3:head [set-alloc-hint object_size 4194304 write_size 4194304,write 3776512~4096] snapc 0=[] ondisk+write+known_if_redirected e1242718) currently op_appliedHere the pg_num is 8.1d6# ceph pg map 8.1d6osdmap e1247126 pg 8.1d6 (8.1d6) -> up [14,38,24] acting [14,38,24][root@ceph-osd-02 ceph]# ceph pg map 8.1d6So the problem is not necessarily is osd.14. It could also in osd.38 or osd.24, or in the relevant hosts
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com