On 05/30/2017 05:07 AM, nokia ceph wrote:
Hello Mark, I can able to reproduce this problem everytime.
Ok, next question, does it happen 60s after starting the 200MB/s load, or does it take a while? Sounds like it's pretty random across OSDs? I'm thinking we want to figure out what state the tp_osd_tp threads are in when this is happening (maybe via a gdb bt or using the wallclock profiler to gather several samples) and then figure out via the logs what lead to the chain of events that put it in that state.
Mark
Env:-- 5 node, v12.0.3, EC 4+1 bluestore , RHEL 7.3 - 3.10.0-514.el7.x86_64 Tested with debug bluestore = 20... From ceph watch =========== 2017-05-30 08:57:33.510794 mon.0 [INF] pgmap v15649: 8192 pgs: 8192 active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 35167 B/s rd, 77279 kB/s wr, 726 op/s 2017-05-30 08:57:35.134604 mon.0 [INF] pgmap v15650: 8192 pgs: 8192 active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 19206 B/s rd, 63835 kB/s wr, 579 op/s 2017-05-30 08:57:36.295086 mon.0 [INF] pgmap v15651: 8192 pgs: 8192 active+clean; 774 GB data, 1359 GB used, 1854 TB / 1855 TB avail; 10999 B/s rd, 44093 kB/s wr, 379 op/s 2017-05-30 08:57:37.228422 mon.0 [INF] osd.243 10.50.62.154:6895/1842853 <http://10.50.62.154:6895/1842853> failed (4 reporters from different host after 20.062159 >= grace 20.000000) 2017-05-30 08:57:37.234308 mon.0 [INF] osd.19 10.50.62.152:6856/1940715 <http://10.50.62.152:6856/1940715> failed (4 reporters from different host after 20.000234 >= grace 20.000000) 2017-05-30 08:57:37.368342 mon.0 [INF] pgmap v15652: 8192 pgs: 8192 active+clean; 774 GB data, 1360 GB used, 1854 TB / 1855 TB avail; 12628 B/s rd, 50471 kB/s wr, 444 op/s 2017-05-30 08:57:37.506743 mon.0 [INF] osdmap e848: 340 osds: 338 up, 340 in From failed osd.19 log ========= === 2017-05-30 08:57:04.155836 7f9d6c723700 10 bluestore(/var/lib/ceph/osd/ceph-19) _omap_setkeys 1.1e85s4_head 4#1:a1780000::::head# = 0 2017-05-30 08:57:04.155840 7f9d6c723700 10 bluestore(/var/lib/ceph/osd/ceph-19) _txc_calc_cost 0x7f9da2d1f180 cost 3542664 (2 ios * 1500000 + 542664 bytes) 2017-05-30 08:57:04.155841 7f9d6c723700 20 bluestore(/var/lib/ceph/osd/ceph-19) _txc_write_nodes txc 0x7f9da2d1f180 onodes 0x7f9da6651840 shared_blobs 2017-05-30 08:57:04.155843 7f9d6c723700 20 bluestore.extentmap(0x7f9da6651930) update 4#1:a179feb3:::%2fc20%2fvx0000033%2fpot%2fchannel1%2fhls%2fvar2660000%2f00000%2fseg0015.ts:head# 2017-05-30 08:57:04.155852 7f9d6c723700 20 bluestore.extentmap(0x7f9da6651930) update inline shard 607 bytes from 2 extents 2017-05-30 08:57:04.155858 7f9d6c723700 20 bluestore(/var/lib/ceph/osd/ceph-19) onode 4#1:a179feb3:::%2fc20%2fvx0000033%2fpot%2fchannel1%2fhls%2fvar2660000%2f00000%2fseg0015.ts:head# is 1348 (735 bytes onode + 2 bytes spanning blobs + 611 bytes inline extents) 2017-05-30 08:57:04.155876 7f9d6c723700 20 bluestore(/var/lib/ceph/osd/ceph-19) _txc_finalize_kv txc 0x7f9da2d1f180 allocated 0x[388af10000~90000] released 0x[] 2017-05-30 08:57:04.155880 7f9d6c723700 10 freelist allocate 0x388af10000~90000 2017-05-30 08:57:04.155882 7f9d6c723700 20 freelist _xor first_key 0x388af00000 last_key 0x388af80000 2017-05-30 08:57:07.698556 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0) 2017-05-30 08:57:08.201089 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0) 2017-05-30 08:57:09.903290 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0) 2017-05-30 08:57:11.605590 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0) 2017-05-30 08:57:15.707908 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0) 2017-05-30 08:57:17.410365 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0) 2017-05-30 08:57:18.512773 7f9d6a71f700 20 bluestore(/var/lib/ceph/osd/ceph-19) statfsstore_statfs(0x574506f0000/0x5753b990000, stored 0xa389a531/0xa70e0000, compress 0x0/0x0/0x0) *2017-05-30 08:57:20.005812 7f9d8f712700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60* *2017-05-30 08:57:20.006648 7f9d8ff13700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60* *2017-05-30 08:57:20.015585 7f9d8f712700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60* *2017-05-30 08:57:20.015677 7f9d8f712700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60* * * *..<snip>..* 2017-05-30 08:57:37.506457 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) _do_write perform garbage collection, expected benefit = 0 AUs 2017-05-30 08:57:37.506459 7f9d875fd700 20 bluestore.extentmap(0x7f9da0ed50b0) dirty_range mark inline shard dirty 2017-05-30 08:57:37.506461 7f9d875fd700 10 bluestore(/var/lib/ceph/osd/ceph-19) _write meta #-1:7b3f43c4:::osd_superblock:0# 0x0~1f8 = 0 2017-05-30 08:57:37.506463 7f9d875fd700 10 bluestore(/var/lib/ceph/osd/ceph-19) _txc_calc_cost 0x7f9da2d22680 cost 3104677 (2 ios * 1500000 + 104677 bytes) 2017-05-30 08:57:37.506479 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) _txc_write_nodes txc 0x7f9da2d22680 onodes 0x7f9da0ed4fc0,0x7f9da5bd8000,0x7f9da72dad80 shared_blobs 2017-05-30 08:57:37.506482 7f9d875fd700 20 bluestore.extentmap(0x7f9da0ed50b0) update #-1:7b3f43c4:::osd_superblock:0# 2017-05-30 08:57:37.506485 7f9d875fd700 20 bluestore.extentmap(0x7f9da0ed50b0) update inline shard 81 bytes from 1 extents 2017-05-30 08:57:37.506491 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) onode #-1:7b3f43c4:::osd_superblock:0# is 108 (21 bytes onode + 2 bytes spanning blobs + 85 bytes inline extents) 2017-05-30 08:57:37.506495 7f9d875fd700 20 bluestore.extentmap(0x7f9da5bd80f0) update #-1:a51ca935:::osdmap.848:0# 2017-05-30 08:57:37.506497 7f9d875fd700 20 bluestore.extentmap(0x7f9da5bd80f0) update inline shard 235 bytes from 2 extents 2017-05-30 08:57:37.506499 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) onode #-1:a51ca935:::osdmap.848:0# is 264 (23 bytes onode + 2 bytes spanning blobs + 239 bytes inline extents) 2017-05-30 08:57:37.506502 7f9d875fd700 20 bluestore.extentmap(0x7f9da72dae70) update #-1:48acce63:::inc_osdmap.848:0# 2017-05-30 08:57:37.506504 7f9d875fd700 20 bluestore.extentmap(0x7f9da72dae70) update inline shard 82 bytes from 1 extents 2017-05-30 08:57:37.506506 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) onode #-1:48acce63:::inc_osdmap.848:0# is 110 (22 bytes onode + 2 bytes spanning blobs + 86 bytes inline extents) 2017-05-30 08:57:37.506558 7f9d875fd700 20 bluestore(/var/lib/ceph/osd/ceph-19) _txc_finalize_kv txc 0x7f9da2d22680 allocated 0x[388afa0000~30000] released 0x[] 2017-05-30 08:57:37.506562 7f9d875fd700 10 freelist allocate 0x388afa0000~30000 2017-05-30 08:57:37.506563 7f9d875fd700 20 freelist _xor first_key 0x388af80000 last_key 0x388af80000 2017-05-30 08:57:37.510446 7f9d8ff13700 0 -- 10.50.63.152:6857/1940715 <http://10.50.63.152:6857/1940715> >> 10.50.63.153:6869/1572972 <http://10.50.63.153:6869/1572972> conn(0x7f9da698a800 :6857 s=STATE_OPEN pgs=243 cs=1 l=0).fault with nothing to send, going to standby 2017-05-30 08:57:37.514366 7f9d8ff13700 0 -- 10.50.63.152:6857/1940715 <http://10.50.63.152:6857/1940715> >> 10.50.63.153:6863/1572982 <http://10.50.63.153:6863/1572982> conn(0x7f9da620f800 :6857 s=STATE_OPEN pgs=149 cs=1 l=0).fault with nothing to send, going to standby 2017-05-30 08:57:37.517351 7f9d8ef11700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60 2017-05-30 08:57:37.517354 7f9d8ef11700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60 ==== config value --> threadpool_default_timeout = 60 // default timeout while caling WaitInterval on an empty queue OPTION(threadpool_default_timeout, OPT_INT, 60) I altered the above value to 300, then it serve for some time 2017-05-30 09:44:32.027168 7efc1f2b9700 20 bluestore(/var/lib/ceph/osd/ceph-41) statfsstore_statfs(0x574f2860000/0x5753b990000, stored 0x2e65ed1/0x6730000, compress 0x0/0x0/0x0) 2017-05-30 09:44:35.529164 7efc1f2b9700 20 bluestore(/var/lib/ceph/osd/ceph-41) statfsstore_statfs(0x574f2860000/0x5753b990000, stored 0x2e65ed1/0x6730000, compress 0x0/0x0/0x0) 2017-05-30 09:44:40.831130 7efc1f2b9700 20 bluestore(/var/lib/ceph/osd/ceph-41) statfsstore_statfs(0x574f2860000/0x5753b990000, stored 0x2e65ed1/0x6730000, compress 0x0/0x0/0x0) 2017-05-30 09:44:41.634056 7efc3fafa700 -1 osd.41 878 heartbeat_check: no reply from 10.50.62.152:6853 <http://10.50.62.152:6853> osd.30 since back 2017-05-30 09:44:21.521517 front 2017-05-30 09:44:21.521517 (cutoff 2017-05-30 09:44:21.634053) *2017-05-30 09:44:42.128280 7efc442ac700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 300* *2017-05-30 09:44:42.128282 7efc442ac700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 300* *2017-05-30 09:44:42.128284 7efc442ac700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 300* Do you require any logs from my end? Shall I open tracker for this issue? Thanks Jayaram On Mon, May 29, 2017 at 4:58 AM, Mark Nelson <mnelson@xxxxxxxxxx <mailto:mnelson@xxxxxxxxxx>> wrote: Hi Jake, 200MB/s is pretty low load across 5 servers. I wouldn't expect the tp_osd_tp threads to be that heavily loaded that it's not responding for 60s. Sounds like a bug. Can you reproduce it? It might be worth trying it with debug bluestore = 20. Mark On 05/27/2017 05:02 AM, Jake Grimmett wrote: Dear All, I wonder if anyone can give advice regarding bluestore OSD's going down on Luminous 12.0.3 when the cluster is under moderate (200MB/s) load, OSD's seem to do down randomly across the 5 OSD servers. When one OSD is down, load decreases, so no further OSD's drop, until I restart the OSD, then another fails. There are no obvious disk errors seen in /var/log/messages. Here though is part of a ceph-osd.46.log... 2017-05-27 10:42:28.781821 7f7c503b4700 0 log_channel(cluster) log [WRN] : slow request 120.247733 seconds old, received at 2017-05-27 10:40:28.534048: osd_op(osd.52.1251:8732 1.3ffs0 1.a8ec73ff (undecoded) ondisk+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e1342) currently queued_for_pg 2017-05-27 10:42:33.748109 7f7c52bf1700 1 heartbeat_map is_healthy 'tp_osd_tp thread tp_osd_tp' had timed out after 60 - these two errors repeat, with more of the 'tp_osd_tp thread tp_osd_tp' errors. Hopefully this is not due to the highpoint rocket r750 cards in my OSD servers (the OSD servers are all 45drive.com <http://45drive.com> storinators) Other info - each node has 64GB ram, 10 x 8TB Ironwolf drive, 10Gb Intel nic, single E5-2620 v4 Any advice gratefully received! thanks, Jake _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com