On Thu, Mar 1, 2018 at 10:57 PM, David Turner <
drakonstein@xxxxxxxxx> wrote:
Blocked requests and slow requests are synonyms in ceph. They are 2 names
for the exact same thing.
On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
On Thu, Mar 1, 2018 at 2:47 PM, David Turner <drakonstein@xxxxxxxxx>
wrote:
`ceph health detail` should show you more information about the slow
requests. If the output is too much stuff, you can grep out for blocked
or
something. It should tell you which OSDs are involved, how long they've
been slow, etc. The default is for them to show '> 32 sec' but that may
very well be much longer and `ceph health detail` will show that.
Hi David,
Thank you for the reply. Unfortunately, the health detail only shows
blocked requests. This seems to be related to a compression setting
on the pool, nothing in OSD logs.
I replied to another compression thread. This makes sense since
compression is new, and in the past all such issues were reflected in
OSD logs and related to either network or OSD hardware.
Regards,
Alex
On Thu, Mar 1, 2018 at 2:23 PM Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>
wrote:
Is there a switch to turn on the display of specific OSD issues? Or
does the below indicate a generic problem, e.g. network and no any
specific OSD?
2018-02-28 18:09:36.438300 7f6dead56700 0
mon.roc-vm-sc3c234@0(leader).data_health(46) update_stats avail 56%
total 15997 MB, used 6154 MB, avail 9008 MB
2018-02-28 18:09:41.477216 7f6dead56700 0 log_channel(cluster) log
[WRN] : Health check failed: 73 slow requests are blocked > 32 sec
(REQUEST_SLOW)
2018-02-28 18:09:47.552669 7f6dead56700 0 log_channel(cluster) log
[WRN] : Health check update: 74 slow requests are blocked > 32 sec
(REQUEST_SLOW)
2018-02-28 18:09:53.794882 7f6de8551700 0
mon.roc-vm-sc3c234@0(leader) e1 handle_command mon_command({"prefix":
"status", "format": "json"} v 0) v1
--
I was wrong where the pool compression does not matter, even
uncompressed pool also generates these slow messages.
Question is why no subsequent message relating to specific OSDs (like
in Jewel and prior, like this example from RH:
2015-08-24 13:18:10.024659 osd.1 127.0.0.1:6812/3032 9 : cluster [WRN]
6 slow requests, 6 included below; oldest blocked for > 61.758455 secs
2016-07-25 03:44:06.510583 osd.50 [WRN] slow request 30.005692 seconds
old, received at {date-time}: osd_op(client.4240.0:8
benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4
currently waiting for subops from [610]
In comparison, my Luminous cluster only shows the general slow/blocked message:
2018-03-01 21:52:54.237270 7f7e419e3700 0 log_channel(cluster) log
[WRN] : Health check failed: 116 slow requests are blocked > 32 sec
(REQUEST_SLOW)
2018-03-01 21:53:00.282721 7f7e419e3700 0 log_channel(cluster) log
[WRN] : Health check update: 66 slow requests are blocked > 32 sec
(REQUEST_SLOW)
2018-03-01 21:53:08.534244 7f7e419e3700 0 log_channel(cluster) log
[WRN] : Health check update: 5 slow requests are blocked > 32 sec
(REQUEST_SLOW)
2018-03-01 21:53:10.382510 7f7e419e3700 0 log_channel(cluster) log
[INF] : Health check cleared: REQUEST_SLOW (was: 5 slow requests are
blocked > 32 sec)
2018-03-01 21:53:10.382546 7f7e419e3700 0 log_channel(cluster) log
[INF] : Cluster is now healthy
So where are the details?
Thanks,
Alex
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxxhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com