Dear All, I wonder if anyone can give advice regarding bluestore OSD's going down on Luminous 12.0.3 when the cluster is under moderate (200MB/s) load, OSD's seem to do down randomly across the 5 OSD servers. When one OSD is down, load decreases, so no further OSD's drop, until I restart the OSD, then another fails. There are no obvious disk errors seen in /var/log/messages. Here though is part of a ceph-osd.46.log... 2017-05-27 10:42:28.781821 7f7c503b4700 0 log_channel(cluster) log [WRN] : slow request 120.247733 seconds old, received at 2017-05-27 10:40:28.534048: osd_op(osd.52.1251:8732 1.3ffs0 1.a8ec73ff (undecoded) ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e1342) currently queued_for_pg 2017-05-27 10:42:33.748109 7f7c52bf1700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60 - these two errors repeat, with more of the 'tp_osd_tp thread tp_osd_tp' errors. Hopefully this is not due to the highpoint rocket r750 cards in my OSD servers (the OSD servers are all 45drive.com storinators) Other info - each node has 64GB ram, 10 x 8TB Ironwolf drive, 10Gb Intel nic, single E5-2620 v4 Any advice gratefully received! thanks, Jake -- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com