Re: osds have slow requests on Ceph luminous FileStore

Mark Nelson <mnelson@xxxxxxxxxx> · Wed, 19 Jul 2017 08:37:47 -0500

Hi Junqin,

Is this the same cluster where bluestore was previously performing 
slowly for you?  Your disks are definitely being hammered pretty hard. 
The throughput doesn't appear especially low for HDDs.  I did notice 
your using 8196 vs 8192 for slightly non-4k aligned IOs.  The latency is 
the biggest oddity.  Earlier you mentioned you are using a 20000Mb/s 
network.  Is that bonded 10GbE?  We've seen cases in the past on a 
customer cluster where the bonding configuration was causing some really 
nasty network latency on certain routes.

Mark

On 07/18/2017 05:42 AM, Junqin JQ7 Zhang wrote:
Hi,

I met "osds have slow requests" problem on Ceph luminous 12.1.0 FileStore on a newly installed cluster.
We have tested 10.2.* version before, but didn't see such kind of issue.
Could anyone help this?  Thanks in advance.

The Ceph cluster is on 3 nodes.
I have tried to restart all OSDs several times, but still same result.

Here are the phenomenons.
After, I did rbd bench with high --io-threads, then I can see following error.

[root@ceph-1 ~]# rbd bench --io-type write --io-size 8196 --io-threads 8 --io-total 4G  --io-pattern rand  rbd/rbdtest
bench  type write io_size 8196 io_threads 8 bytes 4294967296 pattern random
  SEC       OPS   OPS/SEC   BYTES/SEC
    1      7043   7051.57  57794684.13
    2     15416   7712.12  63208574.94
......
   62    508945   8271.42  67792552.91
   63    517257   8274.47  67817535.56
elapsed:    63  ops:   524033  ops/sec:  8211.96  bytes/sec: 67305203.40
2017-07-18 18:13:08.293445 7fb09ffff700 -1 librbd::ImageWatcher: 0x7fb090001ed0 image watch failed: 140396306931344, (107) Transport endpoint is not connected
2017-07-18 18:13:08.293483 7fb09ffff700 -1 librbd::Watcher: 0x7fb090001ed0 handle_error: handle=140396306931344: (107) Transport endpoint is not connected

Then get " osds have slow requests "
[root@ceph-1 osddebug]# ceph -s
  cluster:
    id:     1f1e3ab0-0e93-4e0e-8cda-fb5d67e072e0
    health: HEALTH_WARN
            34213 requests are blocked > 32 sec
            8 osds have slow requests

And I see very high apply_latency.
[root@ceph-1 osddebug]# ceph osd perf
osd commit_latency(ms) apply_latency(ms)
 23                  0             28958
 22                  0             30457
  9                  0             35393
  8                  0             43926
  7                  0             45192
  6                  0             39647
  5                  0             52997
  4                  0             45622
  3                  0             36825
  2                  0             33094
  1                  0             49191
  0                  0             40646
 10                  0             29356
 11                  0             36629
 12                  0             28765
 13                  0             39133
 14                  0             38505
 15                  0             37818
 16                  0             30575
 17                  0             43622
 18                  0             41289
 19                  0             39110
 20                  0             39957
 21                  0             38773

Here is iostat in one node when perform bench.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.31    0.00    4.73   42.98    0.00   44.98

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00 8094.00     0.00 49184.00    12.15     0.23    0.03    0.00    0.03   0.03  21.00
sdb               0.00     0.00    0.00 8250.00     0.00 50144.00    12.16     0.26    0.03    0.00    0.03   0.03  23.50
sde               0.00   327.00  114.00  468.00   456.00  6644.00    24.40    11.94   20.24   46.76   13.77   1.72 100.00
sdg               0.00   399.00  145.00  492.00   580.00  5056.00    17.70     7.99   12.38   43.21    3.29   1.57 100.00
sdk               0.00   422.00  143.00  534.00   572.00  5780.00    18.77     8.55   13.63   43.87    5.53   1.48 100.00
sdi               0.00   322.00  115.00  581.00   460.00 12016.00    35.85    25.69   77.43   65.58   79.78   1.44 100.00
sdh               0.00  1914.00   25.00  436.00   100.00 17796.00    77.64   145.97  317.56  257.60  321.00   2.17 100.00
sdd               0.00   452.00  155.00  549.00   620.00  5756.00    18.11     8.32   11.85   41.14    3.58   1.42 100.00
sdf               0.00     5.00   11.00  590.00    44.00  2792.00     9.44   146.84  243.25  465.55  239.11   1.66 100.00
sdj               0.00   378.00  120.00  464.00   480.00  4824.00    18.16     7.84   13.23   50.51    3.59   1.71 100.00
sdc               0.00     2.00    0.00    6.00     0.00    76.00    25.33     0.05    8.50    0.00    8.50   8.50   5.10
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    6.00     0.00    76.00    25.33     0.05    8.67    0.00    8.67   8.50   5.10

log in ceph-osd
2017-07-18 18:13:22.104639 7fe34c270700  0 log_channel(cluster) log [WRN] : slow request 43.722246 seconds old, received at 2017-07-18 18:12:38.381955: osd_op(client.76896.0:509128 0.7b 0:de320fa8:::rbd_data.ba112ae8944a.00000000000028a6:head [write 3813028~8196 [fadvise_random]] snapc 0=[] ondisk+write+known_if_redirected e1119) currently commit_sent
2017-07-18 18:13:22.104644 7fe34c270700  0 log_channel(cluster) log [WRN] : slow request 43.714032 seconds old, received at 2017-07-18 18:12:38.390168: osd_repop(client.76896.0:509193 0.29d e1119/1116 0:b97947f6:::rbd_data.ba112ae8944a.0000000000004654:head v 1119'17412) currently commit_sent
2017-07-18 18:13:22.104648 7fe34c270700  0 log_channel(cluster) log [WRN] : slow request 43.653849 seconds old, received at 2017-07-18 18:12:38.450352: osd_repop(client.76896.0:509685 0.1e e1119/1117 0:781ecb04:::rbd_data.ba112ae8944a.0000000000003c5f:head v 1119'13702) currently commit_sent
2017-07-18 18:13:23.105083 7fe34c270700  0 log_channel(cluster) log [WRN] : 1471 slow requests, 5 included below; oldest blocked for > 44.206332 secs
2017-07-18 18:13:23.105098 7fe34c270700  0 log_channel(cluster) log [WRN] : slow request 44.206332 seconds old, received at 2017-07-18 18:12:38.898459: osd_repop(client.76896.0:513404 0.3a1 e1119/1117 0:85dab8cb:::rbd_data.ba112ae8944a.0000000000001fb4:head v 1119'15378) currently commit_sent
2017-07-18 18:13:23.105104 7fe34c270700  0 log_channel(cluster) log [WRN] : slow request 44.119205 seconds old, received at 2017-07-18 18:12:38.985585: osd_op(client.76896.0:514135 0.2e0 0:07575ed3:::rbd_data.ba112ae8944a.0000000000004782:head [write 2665884~8196 [fadvise_random]] snapc 0=[] ondisk+write+known_if_redirected e1119) currently commit_sent
2017-07-18 18:13:23.105109 7fe34c270700  0 log_channel(cluster) log [WRN] : slow request 44.098773 seconds old, received at 2017-07-18 18:12:39.006017: osd_op(client.76896.0:514296 0.306 0:60da6e83:::rbd_data.ba112ae8944a.0000000000002a78:head [write 2333184~8196 [fadvise_random]] snapc 0=[] ondisk+write+known_if_redirected e1119) currently commit_sent
2017-07-18 18:13:23.105125 7fe34c270700  0 log_channel(cluster) log [WRN] : slow request 44.063825 seconds old, received at 2017-07-18 18:12:39.040965: osd_repop(client.76896.0:514585 0.1e e1119/1117 0:783122c6:::rbd_data.ba112ae8944a.0000000000002292:head v 1119'13711) currently commit_sent

I also checked dmesg, found following log, but seems not at this time.
[22932.710120] INFO: task filestore_sync:5128 blocked for more than 120 seconds.
[22932.710121] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[22932.710123] filestore_sync  D 00007f902b69c708     0  5128      1 0x00000080
[22932.710126]  ffff885f8d78fd50 0000000000000086 ffff885fbe6f9f60 ffff885f8d78ffd8
[22932.710130]  ffff885f8d78ffd8 ffff885f8d78ffd8 ffff885fbe6f9f60 ffff885f8d78fe80
[22932.710133]  ffff885f8d78fe88 7fffffffffffffff ffff885fbe6f9f60 00007f902b69c708

Here is my osd configure.
[osd.0]
host = ceph-1
osd journal = /var/lib/ceph/osd/ceph-0-journal/journal  # a ssd partion
osd data =  /var/lib/ceph/osd/ceph-0                                      # a HDD disk

Could anyone help how to move on to debug this?
Thanks a lot!

B.R.
Junqin Zhang
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html