Hi Haomai, Could you reply the previous email? B.R. Changcheng -----Original Message----- From: Liu, Changcheng [mailto:changcheng.liu@xxxxxxxxx] Sent: Tuesday, November 19, 2019 1:26 PM To: haomai@xxxxxxxx Cc: liupeng37@xxxxxxxxx; rpenyaev@xxxxxxx; dev@xxxxxxx Subject: Re: Ceph RDMA Update On 02:53 Tue 19 Nov, haomai@xxxxxxxx wrote: > Liu, Changcheng <changcheng.liu@xxxxxxxxx> > > > > Hi Haomai, > > I read your below presentation: > > Topic: CEPH RDMA UPDATE > > Link: https://www.openfabrics.org/images/eventpresos/2017presentations/103_Ceph_HWang.pdf > > > > I want to talk about the items on page 17: > > I. Work in Progress: > > 1. RDMA-CM for control path > > [Changcheng]: > > Do you also prefer that we need use RDMA-CM for connection management? > > RDMA-CM has good wrapper [Changcheng]: 1. Do we have plan to use RDMA-CM connection management by default for RDMA in Ceph? Currently, RDMA-CM connection has been integrated into Ceph code. However, it will only work when setting 'ms_async_rdma_cm=true' while the default value of ms_async_rdma_cm is false. It's really not good that we maintine to connection management method for RDMA in Ceph. What's about changing the default connection management to RDMA-CM? > > > 1) Support multiple devices > > [Changcheng]: > > Do you mean seperate public & cluster network and use both RDMA on public & cluster network? > > Currently, Ceph could work under RDMA with below solution: > > a. Make no difference between public & cluster network, both use the same RDMA device port for RDMA messenger. > > OR > > b. Public network is based on TCP posix and cluster network is running on RDMA. > > 2) Enable unified ceph.conf for all ceph nodes > > [Changcheng]: > > Do you mean that in some node, ceph need set different RDMA device port to be used? > > hmm, yes [Changcheng]: 2. If there's plan to let both public & cluster network run RDMA on seperate network, we must use RDMA-CM for connection management, right? > > > 2. Ceph replication Zero-copy > > 1) Reduce number of memcpy by half by re-using data buffers on primary OSD > > [Changcheng]: > > What does it mean? Any technical sharing about this iteam? > > It's a long story..... [Changcheng]: 3. Is this related with RDMA? Has it been implemented in Ceph? > > > 3. Tx zero-copy > > Avoid copy out by using reged memory > > [Changcheng]: > > I've read the code, the function:tx_copy_chunk will copy data to segmented chunk to be sent. How do you solve the zero-copy problem? > > it's mean register data buffer read from storage device [Changcheng]: 4. Do you mean that 1) create the RDMA Memory Region(MR) first 2) use the MR in bufferlist 3) post the bufferlist as work request in RDMA send queue to be sent directly without using tx_copy_chunk? > > > > > II. ToDo: > > 1. Use RDMA Read/Write for better memory utilization > > [Changcheng]: > > Any plan to implement RDMA Read/Write? How to solve the compatiblity problem since the previous implementation is based on RC-Send/RC-Recv? > > Maybe it's not a good idea now [Changcheng]: 5. Is there any background that we don't use Read/Write semantics in Ceph RDMA implementation? > > > 2. ODP - On demand paging > > [Changcheng]: > > Do you mean that "the registered Memory Region is pinned to physical page and can't be swapped out" problem? > > No, it's a transparent register tech, currently it's not available [Changcheng]: Thanks for your info. > > > 3. Erasure-coding using HW offload. > > [Changcheng]: > > Is this related with RDMA NIC? > > SMART-NIC [Changcheng]: Thanks for your info. > > > > > B.R. > > Changcheng > > _______________________________________________ > > Dev mailing list -- dev@xxxxxxx > > To unsubscribe send an email to dev-leave@xxxxxxx _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx