Re: Ceph RDMA Update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Liu, Changcheng <changcheng.liu@xxxxxxxxx> 于2019年11月27日周三 上午10:54写道:
>
> Hi Haomai,
>    For Ceph RDMA, it need develop below features to refine RDMA messenger
>    implementation in Ceph. What do you think of them?
>    1. Implement both public and cluster network messenger on RDMA seperately.
>       Current status:
>         Only cluster network messenger implementation support RDMA.
>
>    2. Use RNIC IP to lookup the RDMA device to simplify Ceph configuration.
>       Current status:
>         It need specify RDMA device name in Ceph configuration. However,
>         the device name isn't always unifed on all the nodes.
>
>    3. Use RDMA-CM for connection management.
>       Current status:
>        RDMA-CM connection management only work for iWARP in Ceph
>        currently, it lacks of continuous verification since there're
>        seldom users use iWARP in Ceph.
>        RDMA-CM is better for connection managmeent than other methods,
>        it needs to extend the implementation in Ceph to support RoCEv1 &
>        RoCEv2.

I always support all of rdma efforts

@Kefu Chai what's your idea under crismon dev?

>
> B.R.
> Changcheng
>
> On 10:27 Wed 27 Nov, Haomai Wang wrote:
> > Liu, Changcheng <changcheng.liu@xxxxxxxxx>
> > >
> > > @Haomai: Please check my reply.
> > >
> > > On 01:52 Wed 27 Nov, <haomai@xxxxxxxx> wrote:
> > > > Liu, Changcheng <changcheng.liu@xxxxxxxxx>
> > > > >
> > > > > On 02:53 Tue 19 Nov, haomai@xxxxxxxx wrote:
> > > > > > Liu, Changcheng <changcheng.liu@xxxxxxxxx>
> > > > > [Changcheng]:
> > > > >    1. Do we have plan to use RDMA-CM connection management by default for RDMA in Ceph?
> > > > >       Currently, RDMA-CM connection has been integrated into Ceph code.
> > > > >       However, it will only work when setting 'ms_async_rdma_cm=true' while the default value of ms_async_rdma_cm is false.
> > > > >       It's really not good that we maintine to connection management method for RDMA in Ceph.
> > > > >
> > > > >       What's about changing the default connection management to RDMA-CM?
> > > > If we have good test over rdma-cm, it should be ok.
> > > [Changcheng]:
> > >  Once using rdma-cm for connection management, it could both support
> > >  RoCEv1/RoCEv2/iWARP, this could unify the Ceph RDMA configuration.
> >
> > sure
> >
> > >
> > > > > >
> > > > > > >  1) Support multiple devices
> > > > > > >  [Changcheng]:
> > > > > > >     Do you mean seperate public & cluster network and use both RDMA on public & cluster network?
> > > > > > >     Currently, Ceph could work under RDMA with below solution:
> > > > > > >       a. Make no difference between public & cluster network, both use the same RDMA device port for RDMA messenger.
> > > > > > >       OR
> > > > > > >       b. Public network is based on TCP posix and cluster network is running on RDMA.
> > > > > > >  2) Enable unified ceph.conf for all ceph nodes
> > > > > > >  [Changcheng]:
> > > > > > >     Do you mean that in some node, ceph need set different RDMA device port to be used?
> > > > > >
> > > > > > hmm, yes
> > > [Changcheng]:
> > >   To avoid "set different RDMA device port to be used", it's better to
> > >   look for the RDMA device according to the RNIC IP address.
> > >   What do you think of it?
> >
> > yes, it's the proper way
> >
> > >
> > > > > [Changcheng]:
> > > > >    2. If there's plan to let both public & cluster network run RDMA on seperate network, we must use RDMA-CM for connection management, right?
> > > > not exactly, but if rdma-cm it will be easier to let code support
> > > [Changcheng]:
> > >  Yes, rdma-cm makes it easier to let code support it.
> > >
> > > > > > It's a long story.....
> > > > > [Changcheng]:
> > > > >    3. Is this related with RDMA? Has it been implemented in Ceph?
> > > > I think we should refer to crimson-ceph to support this
> > > [Changcheng]:
> > >  Thanks for your info.
> > >
> > > >
> > > > > > it's mean register data buffer read from storage device
> > > > > [Changcheng]:
> > > > >    4. Do you mean that 1) create the RDMA Memory Region(MR) first 2) use the MR in bufferlist 3) post the bufferlist as work request in RDMA send queue to be sent directly without using tx_copy_chunk?
> > > > yeap
> > > [Changcheng]:
> > >  This seems impossible. I don't know whether the bufferlist is only for
> > >  message transaction. If we work in this direction, there could be lots
> > >  of changes.
> >
> > hmm, most of things are related to bufferlist ownership problem. it's
> > easy to implement this under your thought, but it's hard to make it
> > stable with unexpected memory range hold problem.
> >
> > >
> > > > > > > II. ToDo:
> > > > > > >    1. Use RDMA Read/Write for better memory utilization
> > > > > > >    [Changcheng]:
> > > > > > >       Any plan to implement RDMA Read/Write? How to solve the compatiblity problem since the previous implementation is based on RC-Send/RC-Recv?
> > > > > >
> > > > > > Maybe it's not a good idea now
> > > > > [Changcheng]:
> > > > >    5. Is there any background that we don't use Read/Write semantics in Ceph RDMA implementation?
> > > > from vendor's info, Read/Write is not welcomed.
> > > [Changcheng]:
> > >  OK. I don't have performance data about the difference
> > >  between Read/Write & Send/Recv. Let's talk this later.
> >
> > --
> > Best Regards,
> >
> > Wheat



-- 

Best Regards,

Wheat
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux