On Wed, May 25, 2022 at 03:42:28PM +0200, Alexandra Winter wrote: > > > On 24.05.22 09:49, Tony Lu wrote: > > On Tue, May 24, 2022 at 02:52:07PM +0800, D. Wythe wrote: > >> From: "D. Wythe" <alibuda@xxxxxxxxxxxxxxxxx> > >> > >> Hi Karsten, > >> > >> We are promoting SMC-R to the field of cloud computing, dues to the > >> particularity of business on the cloud, the scale and the types of > >> customer applications are unpredictable. As a participant of SMC-R, we > >> also hope that SMC-R can cover more application scenarios. Therefore, > >> many connection problems are exposed during this time. There are two > >> main issue, one is that the establishment of a single connection takes > >> longer than that of the TCP, another is that the degree of concurrency > >> is low under multi-connection processing. This patch set is mainly > >> optimized for the first issue, and the follow-up of the second issue > >> will be synchronized in the future. > >> > >> In terms of communication process, under current implement, a TCP > >> three-way handshake only needs 1-RTT time, while SMC-R currently > >> requires 4-RTT times, including 2-RTT over IP(TCP handshake, SMC > >> proposal & accept ) and 2-RTT over IB ( two times RKEY exchange), which > >> is most influential factor affecting connection established time at the > >> moment. > >> > >> We have noticed that single network interface card is mainstream on the > >> cloud, dues to the advantages of cloud deployment costs and the cloud's > >> own disaster recovery support. On the other hand, the emergence of RoCE > >> LAG technology makes us no longer need to deal with multiple RDMA > >> network interface cards by ourselves, just like NIC bonding does. In > >> Alibaba, Roce LAG is widely used for RDMA. > > > > I think this is an interesting topic whether we need SMC-level link > > redundancy. I agreed with that RoCE LAG and RDMA in cloud vendors handle > > redundancy and failover in the lower layer, and do it transparently for > > SMC. > > > > So let's move on, if a RDMA device has redundancy ability, we could make > > SMC simpler by give an option for user-space or based on the device > > capability (if we have this flag). This allows under layer to ensure the > > reliability of link group. > > > > As RFC 7609 mentioned, we should do some extra work for reliability to > > add link. It should be an optional work if the device have capability > > for redundancy, and make link group simpler and faster (for the > > so-called SMC-2RTT in this RFC). > > > > I also notice that RFC 7609 is released on August 2015, which is earlier > > than RoCE LAG. RoCE LAG is provided after ConnectX-3/ConnectX-3 Pro in > > kernel 4.0, and is available in 2017. And cloud vendors' RDMA adapters, > > such as Alibaba Elastic RDMA adapter in [1]. > > > > Given that, I propose whether the second link can be used as an option > > in newly created link group. Also, if it is possible, RFC 7609 can be > > updated or extend it for this nowadays case. > > > > Looking forward for your message, Karsten, D. Wythe and folks. > > > > [1] https://lore.kernel.org/linux-rdma/20220523075528.35017-1-chengyou@xxxxxxxxxxxxxxxxx/ > > > > Thanks, > > Tony Lu > > > Thank you D. Wythe for your proposals, the prototype and measurements. > They sound quite promising to us. > > We need to carefully evaluate them and make sure everything is compatible > with the existing implementations of SMC-D and SMC-R v1 and v2. In the > typical s390 environment ROCE LAG is propably not good enough, as the card > is still a single point of failure. So your ideas need to be compatible > with link redundancy. We also need to consider that the extension of the > protocol does not block other desirable extensions. > > Your prototype is very helpful for the understanding. Before submitting any > code patches to net-next, we should agree on the details of the protocol > extension. Maybe you could formulate your proposal in plain text, so we can > discuss it here? > > We also need to inform you that several public holidays are upcoming in the > next weeks and several of our team will be out for summer vacation, so please > allow for longer response times. > > Kind regards > Alexandra Winter > It's glad to hear this. This gave us a lot of confidence to insist on it, thank you. Cheers, Tony Lu