Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

"Jake Grimmett" <jog@xxxxxxxxxxxxxxxxx> · Sat, 27 May 2017 11:02:41 +0100

Dear All,

I wonder if anyone can give advice regarding bluestore OSD's going down on
Luminous 12.0.3 when the cluster is under moderate (200MB/s) load,

OSD's seem to do down randomly across the 5 OSD servers. When one OSD is
down, load decreases, so no further OSD's drop, until I restart the OSD,
then another fails.

There are no obvious disk errors seen in /var/log/messages.

Here though is part of a ceph-osd.46.log...

2017-05-27 10:42:28.781821 7f7c503b4700  0 log_channel(cluster) log [WRN]
: slow request 120.247733 seconds old, received at 2017-05-27
10:40:28.534048: osd_op(osd.52.1251:8732 1.3ffs0 1.a8ec73ff (undecoded)
ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected
e1342) currently queued_for_pg
2017-05-27 10:42:33.748109 7f7c52bf1700  1 heartbeat_map is_healthy
'tp_osd_tp thread tp_osd_tp' had timed out after 60

- these two errors repeat, with more of the 'tp_osd_tp thread tp_osd_tp'
errors.

Hopefully this is not due to the highpoint rocket r750 cards in my OSD
servers (the OSD servers are all 45drive.com storinators)

Other info - each node has 64GB ram, 10 x 8TB Ironwolf drive, 10Gb Intel
nic, single E5-2620 v4

Any advice gratefully received!

thanks,

Jake
-- 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com