Hi, guys,
And after some test, we found a serious read performance problem in Juno client (write is just OK), something like:
# rados bench -p test 30 seq
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 100 84 335.843 336 0.020221 0.0393582
2 16 100 84 167.944 0 - 0.0393582
3 16 100 84 111.967 0 - 0.0393582
4 16 100 84 83.9769 0 - 0.0393582
5 16 100 84 67.1826 0 - 0.0393582
6 16 100 84 55.9863 0 - 0.0393582
7 16 100 84 47.9886 0 - 0.0393582
8 16 100 84 41.9905 0 - 0.0393582
9 16 100 84 37.3249 0 - 0.0393582
10 16 100 84 33.5926 0 - 0.0393582
11 16 100 84 30.5388 0 - 0.0393582
12 16 100 84 27.9938 0 - 0.0393582
13 16 100 84 25.8405 0 - 0.0393582
14 16 100 84 23.9948 0 - 0.0393582
15 16 100 84 22.3952 0 - 0.0393582
And when testing RBD image with fio (bs=512k read), there are:
# grep 12067 ceph.client.log | grep read
2015-05-11 16:19:36.649554 7ff9949d5a00 1 -- 10.10.11.15:0/2012449 --> 10.10.11.21:6835/45746 -- osd_op(client.3772684.0:12067 rbd_data.262a6e7bf17801.0000000000000003 [sparse-read 2621440~524288] 7.c43a3ae3 e240302) v4 -- ?+0 0x7ff9967c5fb0 con 0x7ff99a41c420
2015-05-11 16:20:07.709915 7ff94bfff700 1 -- 10.10.11.15:0/2012449 <== osd.218 10.10.11.21:6835/45746 111 ==== osd_op_reply(12067 rbd_data.262a6e7bf17801.0000000000000003 [sparse-read 2621440~524288] v0'0 uv3803266 _ondisk_ = 0) v6 ==== 199+0+524312 (3484234903 0 0) 0x7ff3a4002ba0 con 0x7ff99a41c420
Some operation takes more an minute.
I checked OSD log (default logging level, ceph.com said when a request takes too long, it will complain in log), and do see some slow 4k write request, but no read.
We have tested Giant, Firefly, and self-built Emperor client, same sad results.
The network between OSD and all-in-one node is 10Gb network, this is from client to OSD:
# iperf3 -c 10.10.11.25 -t 60 -i 1
Connecting to host 10.10.11.25, port 5201
[ 4] local 10.10.11.15 port 41202 connected to 10.10.11.25 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.09 GBytes 9.32 Gbits/sec 11 2.02 MBytes
[ 4] 1.00-2.00 sec 1.09 GBytes 9.35 Gbits/sec 34 1.53 MBytes
[ 4] 2.00-3.00 sec 1.09 GBytes 9.35 Gbits/sec 11 1.14 MBytes
[ 4] 3.00-4.00 sec 1.09 GBytes 9.37 Gbits/sec 0 1.22 MBytes
[ 4] 4.00-5.00 sec 1.09 GBytes 9.34 Gbits/sec 0 1.27 MBytes
and OSD to client (there may be some problem in client interface bonding, 10Gb could not by reached):
# iperf3 -c 10.10.11.15 -t 60 -i 1
Connecting to host 10.10.11.15, port 5201
[ 4] local 10.10.11.25 port 43934 connected to 10.10.11.15 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 400 MBytes 3.35 Gbits/sec 1 337 KBytes
[ 4] 1.00-2.00 sec 553 MBytes 4.63 Gbits/sec 1 341 KBytes
[ 4] 2.00-3.00 sec 390 MBytes 3.27 Gbits/sec 1 342 KBytes
[ 4] 3.00-4.00 sec 395 MBytes 3.32 Gbits/sec 0 342 KBytes
[ 4] 4.00-5.00 sec 541 MBytes 4.54 Gbits/sec 0 346 KBytes
[ 4] 5.00-6.00 sec 405 MBytes 3.40 Gbits/sec 0 358 KBytes
[ 4] 6.00-7.00 sec 728 MBytes 6.11 Gbits/sec 1 370 KBytes
[ 4] 7.00-8.00 sec 741 MBytes 6.22 Gbits/sec 0 355 KBytes
Ceph cluster is shared by this Juno and old Havana (as mentioned, they use exactly same rbd pool), and IO on Havana just goes fine. Any suggestion or advice? So that we can make sure it is an issue of client, network, or ceph cluster and then go on. I am new to Ceph, need some help.
Thanks
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com