Hello Henrik,
Thanks for the update, I applied below two changes and will monitor the cluster.
~~~
bluestore_deferred_throttle_bytes = 0
bluestore_throttle_bytes = 0
~~~
Attached gdb trace after installed debug rpm's
Thanks
On Wed, Sep 20, 2017 at 12:18 PM, Henrik Korkuc <lists@xxxxxxxxx> wrote:
On 17-09-20 08:06, nokia ceph wrote:
maybe similar to http://tracker.ceph.com/Hello,
Env:- RHEL 7.2 , 3.10.0-327.el7.x86_64 , EC 4+1 , bluestore
We are writing to ceph via librados C API . Testing with rados no issues.
The same we tested with Jewel/kraken without any issue. Need your view how to debug this issue?
issues/21180 ? It seems it was resolved for me with mentioned fix. You could apply mentioned config options and see if it helps (and build newer version, if able).I think you may be missing debug packages as there are no thread names.
>>
OSD.log======
~~~
2017-09-18 14:51:59.895746 7f1e744e0700 0 log_channel(cluster) log [WRN] : slow request 60.068824 seconds old, received at 2017-09-18 14:50:59.826849: MOSDECSubOpWriteReply(1.132s0 1350/1344 ECSubWriteReply(tid=971, last_complete=1350'153, committed=1, applied=0)) currently queued_for_pg2017-09-18 14:51:59.895749 7f1e744e0700 0 log_channel(cluster) log [WRN] : slow request 60.068737 seconds old, received at 2017-09-18 14:50:59.826936: MOSDECSubOpWriteReply(1.132s0 1350/1344 ECSubWriteReply(tid=971, last_complete=0'0, committed=0, applied=1)) currently queued_for_pg2017-09-18 14:51:59.895754 7f1e744e0700 0 log_channel(cluster) log [WRN] : slow request 60.067539 seconds old, received at 2017-09-18 14:50:59.828134: MOSDECSubOpWriteReply(1.132s0 1350/1344 ECSubWriteReply(tid=971, last_complete=1350'153, committed=1, applied=0)) currently queued_for_pg2017-09-18 14:51:59.923825 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 1359 k (1083 k + 276 k)2017-09-18 14:51:59.923835 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 1066 k (1066 k + 0 )2017-09-18 14:51:59.923837 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 643 k (643 k + 0 )2017-09-18 14:51:59.923840 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 1049 k (1049 k + 0 )2017-09-18 14:51:59.923842 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 896 k (896 k + 0 )2017-09-18 14:51:59.940780 7f1e77ca5700 20 osd.181 1350 share_map_peer 0x7f1e8dbf2800 already has epoch 13502017-09-18 14:51:59.940855 7f1e78ca7700 20 osd.181 1350 share_map_peer 0x7f1e8dbf2800 already has epoch 13502017-09-18 14:52:00.081390 7f1e6f572700 20 osd.181 1350 OSD::ms_dispatch: ping magic: 0 v12017-09-18 14:52:00.081393 7f1e6f572700 10 osd.181 1350 do_waiters -- start2017-09-18 14:52:00.081394 7f1e6f572700 10 osd.181 1350 do_waiters -- finish2017-09-18 14:52:00.081395 7f1e6f572700 20 osd.181 1350 _dispatch 0x7f1e90923a40 ping magic: 0 v12017-09-18 14:52:00.081397 7f1e6f572700 10 osd.181 1350 ping from client.4145562017-09-18 14:52:00.123908 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 1359 k (1083 k + 276 k)2017-09-18 14:52:00.123926 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 1066 k (1066 k + 0 )2017-09-18 14:52:00.123932 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 643 k (643 k + 0 )2017-09-18 14:52:00.123937 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 1049 k (1049 k + 0 )2017-09-18 14:52:00.123942 7f1e71cdb700 10 trim shard target 102 M meta/data ratios 0.5 + 0 (52428 k + 0 ), current 896 k (896 k + 0 )2017-09-18 14:52:00.145445 7f1e784a6700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1e61cbb700' had timed out after 602017-09-18 14:52:00.145450 7f1e784a6700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1e624bc700' had timed out after 602017-09-18 14:52:00.145496 7f1e784a6700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1e63cbf700' had timed out after 602017-09-18 14:52:00.145534 7f1e784a6700 10 osd.181 1350 internal heartbeat not healthy, dropping ping request2017-09-18 14:52:00.146224 7f1e78ca7700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1e61cbb700' had timed out after 602017-09-18 14:52:00.146226 7f1e78ca7700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f1e624bc700' had timed out after 60
~~~
thread apply all bt
Thread 54 (LWP 479360):#0 0x00007f1e7b5606d5 in ?? ()#1 0x0000000000000000 in ?? ()
Thread 53 (LWP 484888):#0 0x00007f1e7a644b7d in ?? ()#1 0x0000000000000000 in ?? ()
Thread 52 (LWP 484177):#0 0x00007f1e7b5606d5 in ?? ()#1 0x0000000a00000000 in ?? ()#2 0x00007f1e88d8df98 in ?? ()#3 0x00007f1e88d8df48 in ?? ()#4 0x000000000000000a in ?? ()#5 0x00007f1e5ccaf7f8 in ?? ()#6 0x00007f1e7e45b9ee in ?? ()#7 0x00007f1e88d8d860 in ?? ()#8 0x00007f1e8e6e5500 in ?? ()#9 0x00007f1e889881c0 in ?? ()#10 0x00007f1e7e3e9ea0 in ?? ()#11 0x0000000000000000 in ?? ()
Thread 51 (LWP 484176):#0 0x00007f1e7b5606d5 in ?? ()#1 0x0000000000000000 in ?? ()
Thread 50 (LWP 484175):#0 0x00007f1e7b5606d5 in ?? ()#1 0x0000000000000000 in ?? ()
Thread 49 (LWP 484174):#0 0x00007f1e7b5606d5 in ?? ()#1 0x0000000000000000 in ?? ()
~~~
Thanks
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/ listinfo.cgi/ceph-users-ceph. com
Attachment:
gdb.txt.gz
Description: GNU Zip compressed data
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com