Re: The performance of ceph with RDMA

Hung-Wei Chiu (邱宏瑋) <hwchiu@xxxxxxxxxxxxxx> · Fri, 24 Mar 2017 12:07:21 +0800

Thanks both of your reply.
Hi haomai.

If we compare the RDMA with Tcp/Ip stack, as I know, we can use the RDMA to offload the traffic and reduce the CPU usage, which means the other components can user more CPU to increase some performance metrics, such as IOPS ?

Hi Deepak,

I would describe more detail of my environment and hope you can give me more advice about it.

[Ceph Cluster]
1 pool
1 rbd
[Host Daemon]
1 ceph-mon
8 ceph-hosts
1 fio server (compile with librbd and librbd is compile to support the RDMA)
[fio config]
 [global]
 ioengine=rbd
 clientname=admin
 pool=rbd
 rbdname=rbd
 clustername=ceph
 runtime=120
 iodepth=128
 numjobs=6
 group_reporting
 size=256G
 direct=1
 ramp_time=5
 [r75w25]
 bs=4k
 rw=randrw
 rwmixread=75

In my RDMA experiment, I start the fio clinet on host 1 and it will trigger 3 fio servers (on each hosts) to start the rand_read_write for specific rbd.
Although I don't specific the public/cluster network address in the ceph.conf, I guess all traffic between cluster will over 10G networks since I only input 10G's IP addresses in my manually deploy.
Since the ceph.conf indicate to use the RDMA as the ms_type, I think the connection between fio and rbd is based on RDMA,

During the fio processing, I observe following system metrics.

1. System CPU usage
2. NIC (1G) throughput
3. NIC (10G) throughtput
4. SSD IO stat.

Only the CPU usage is full (100%) and used by fio server and ceph-osds, the other 3 metrics still have a room to use, so I think the bottleneck of my environment is CPU usage.
So, according to those observation and concept of RDMA, I assume that the RDMA can offload the network traffic to reduce the CPU and give other co

I think if we can use the RDMA for (cluster/private network), it can offload the network traffic within cluster to reduce the CPU usage to release more CPU for other components.

If I have any misunderstanding , please correct me,

Thanks your help!

Best Regards,

Hung-Wei Chiu(邱宏瑋)--
Computer Center, Department of Computer Science
National Chiao Tung University

2017-03-24 2:22 GMT+08:00 Deepak Naidu <dnaidu@xxxxxxxxxx>:

RDMA is of interest to me. So my below comment.

>> What surprised me is that the result of RDMA mode is almost the same as the basic mode, the iops, latency, throughput, etc.

Pardon  my knowledge here. If I read your ceph.conf and your notes. It seems that you are using RDMA only for “cluster/private network” ? so how do you expect
 RDMA to be efficient on client IOPS/latency/throughput. 

--
Deepak

From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com]
On Behalf Of Haomai Wang

Sent: Thursday, March 23, 2017 4:34 AM

To: Hung-Wei Chiu (邱宏瑋)

Cc: ceph-users@xxxxxxxxxxxxxx

Subject: Re:  The performance of ceph with RDMA

On Thu, Mar 23, 2017 at 5:49 AM, Hung-Wei Chiu (邱宏瑋) <hwchiu@xxxxxxxxxxxxxx> wrote:

Hi,

I use the latest (master branch, upgrade at 2017/03/22) to build ceph with RDMA and use the fio to test its iops/latency/throughput.

In my environment, I setup 3 hosts and list the detail of each host below.

OS: ubuntu 16.04

Storage: SSD * 4 (256G * 4)

Memory: 64GB.

NICs: two NICs, one (intel 1G) for public network and the other (mellanox 10G) for private network.

There're 3 monitor and 24 osds equally distributed within 3 hosts which means each hosts contains 1 mon and 8 osds.

For my experiment, I use two configs, basic and RDMA.

Basic

[global]                                                                                                                                             fsid = 0612cc7e-6239-456c-978b-b4df781fe831

mon initial members = ceph-1,ceph-2,ceph-3

mon host = 10.0.0.15,10.0.0.16,10.0.0.17

osd pool default size = 2

osd pool default pg num = 1024

osd pool default pgp num = 1024

RDMA

[global]                                                                                                                                             fsid = 0612cc7e-6239-456c-978b-b4df781fe831

mon initial members = ceph-1,ceph-2,ceph-3

mon host = 10.0.0.15,10.0.0.16,10.0.0.17

osd pool default size = 2

osd pool default pg num = 1024

osd pool default pgp num = 1024

ms_type=async+rdma

ms_async_rdma_device_name = mlx4_0

What surprised me is that the result of RDMA mode is almost the same as the basic mode, the iops, latency, throughput, etc.

I also try to use different pattern of the fio parameter, such as read and write ratio, random operations or sequence operations. 

All results are the same.

yes, most of latency comes from other components now.. although we still want to avoid extra copy in rdma side. 

so current rdma backend only means it just can be choice compared to tcp/ip network. more benefits need to be get from others.

In order to figure out what's going on. I do the following steps.

1. Follow this article (https://community.mellanox.com/docs/DOC-2086) to make sure my RDMA environment.

2. To make sure the network traffic is transmitted by RDMA, I dump the traffic within the private network and the answear is yes. it use the RDMA.

3. Modify the ms_async_rdma_buffer_size to (256 << 10), no change.

4. Modfiy the ms_async_rdma_send_buffers to 2048, no change.

5. Modify the ms_async_rdma_receive_buffers to 2048, no change.

After above operations, I guess maybe my Ceph setup environment is not good for RDMA to improve the performance.

Do anyone know what kind of the ceph environment (replicated size, # of osd, # of mon, etc) is good for RDMA?

Thanks in advanced.

Best Regards,

Hung-Wei Chiu(邱宏瑋)

--

Computer Center, Department of Computer Science

National Chiao Tung University

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure 
or distribution is prohibited.  If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message. 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com