How to debug a ceph read performance problem?

changqian zuo <dummyhacker85@xxxxxxxxx> · Wed, 13 May 2015 11:59:56 +0800

Hi, guys,

We have been running an OpenStack Havana environment with Ceph 0.72.2 as block storage backend. Recently we were trying to upgrade OpenStack to Juno. For testing, we deployed a Juno all-in-one node, this node share the same Cinder volume rbd pool and Glance image rbd pool with the old Havana.
And after some test, we found a serious read performance problem in Juno client (write is just OK), something like:

# rados bench -p test 30 seq
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1      16       100        84   335.843       336  0.020221 0.0393582
     2      16       100        84   167.944         0         - 0.0393582
     3      16       100        84   111.967         0         - 0.0393582
     4      16       100        84   83.9769         0         - 0.0393582
     5      16       100        84   67.1826         0         - 0.0393582
     6      16       100        84   55.9863         0         - 0.0393582
     7      16       100        84   47.9886         0         - 0.0393582
     8      16       100        84   41.9905         0         - 0.0393582
     9      16       100        84   37.3249         0         - 0.0393582
    10      16       100        84   33.5926         0         - 0.0393582
    11      16       100        84   30.5388         0         - 0.0393582
    12      16       100        84   27.9938         0         - 0.0393582
    13      16       100        84   25.8405         0         - 0.0393582
    14      16       100        84   23.9948         0         - 0.0393582
    15      16       100        84   22.3952         0         - 0.0393582

And when testing RBD image with fio (bs=512k read), there are:

# grep 12067 ceph.client.log | grep read
2015-05-11 16:19:36.649554 7ff9949d5a00  1 -- 10.10.11.15:0/2012449 --> 10.10.11.21:6835/45746 -- osd_op(client.3772684.0:12067 rbd_data.262a6e7bf17801.0000000000000003 [sparse-read 2621440~524288] 7.c43a3ae3 e240302) v4 -- ?+0 0x7ff9967c5fb0 con 0x7ff99a41c420
2015-05-11 16:20:07.709915 7ff94bfff700  1 -- 10.10.11.15:0/2012449 <== osd.218 10.10.11.21:6835/45746 111 ==== osd_op_reply(12067 rbd_data.262a6e7bf17801.0000000000000003 [sparse-read 2621440~524288] v0'0 uv3803266 _ondisk_ = 0) v6 ==== 199+0+524312 (3484234903 0 0) 0x7ff3a4002ba0 con 0x7ff99a41c420

Some operation takes more an minute.

I checked OSD log (default logging level, ceph.com said when a request takes too long, it will complain in log), and do see some slow 4k write request, but no read.

We have tested Giant, Firefly, and self-built Emperor client, same sad results.

The network between OSD and all-in-one node is 10Gb network, this is from client to OSD:

# iperf3 -c 10.10.11.25 -t 60 -i 1
Connecting to host 10.10.11.25, port 5201
[  4] local 10.10.11.15 port 41202 connected to 10.10.11.25 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  1.09 GBytes  9.32 Gbits/sec   11   2.02 MBytes       
[  4]   1.00-2.00   sec  1.09 GBytes  9.35 Gbits/sec   34   1.53 MBytes       
[  4]   2.00-3.00   sec  1.09 GBytes  9.35 Gbits/sec   11   1.14 MBytes       
[  4]   3.00-4.00   sec  1.09 GBytes  9.37 Gbits/sec    0   1.22 MBytes       
[  4]   4.00-5.00   sec  1.09 GBytes  9.34 Gbits/sec    0   1.27 MBytes

and OSD to client (there may be some problem in client interface bonding, 10Gb could not by reached):

# iperf3 -c 10.10.11.15 -t 60 -i 1
Connecting to host 10.10.11.15, port 5201
[  4] local 10.10.11.25 port 43934 connected to 10.10.11.15 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   400 MBytes  3.35 Gbits/sec    1    337 KBytes       
[  4]   1.00-2.00   sec   553 MBytes  4.63 Gbits/sec    1    341 KBytes       
[  4]   2.00-3.00   sec   390 MBytes  3.27 Gbits/sec    1    342 KBytes       
[  4]   3.00-4.00   sec   395 MBytes  3.32 Gbits/sec    0    342 KBytes       
[  4]   4.00-5.00   sec   541 MBytes  4.54 Gbits/sec    0    346 KBytes       
[  4]   5.00-6.00   sec   405 MBytes  3.40 Gbits/sec    0    358 KBytes       
[  4]   6.00-7.00   sec   728 MBytes  6.11 Gbits/sec    1    370 KBytes       
[  4]   7.00-8.00   sec   741 MBytes  6.22 Gbits/sec    0    355 KBytes

Ceph cluster is shared by this Juno and old Havana (as mentioned, they use exactly same rbd pool), and IO on Havana just goes fine. Any suggestion or advice? So that we can make sure it is an issue of client, network, or ceph cluster and then go on. I am new to Ceph, need some help.

Thanks

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com