Re: [RFC PATCH net-next] net/smc: Introduce IPPROTO_SMC for smc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 08, 2023 at 07:25:03PM +0800, D. Wythe wrote:
>From: "D. Wythe" <alibuda@xxxxxxxxxxxxxxxxx>
>
>This patch attempts to initiate a discussion on creating smc socket
>via AF_INET, similar to the following code snippet:
>
>/* create v4 smc sock */
>v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);
>
>/* create v6 smc sock */
>v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);
>
>As we all know, the way we currently create an SMC socket as
>follows.
>
>/* create v4 smc sock */
>v4 = socket(AF_SMC, SOCK_STREAM, SMCPROTO_SMC);
>
>/* create v6 smc sock */
>v6 = socket(AF_SMC, SOCK_STREAM, SMCPROTO_SMC6);
>
>Note: This is not to suggest removing the SMC path, but rather to propose
>adding a new path (inet path).
>
>There are several reasons why we believe it is much better than AF_SMC:
>
>Semantics:
>
>SMC extends the TCP protocol and switches it's data path to RDMA path if
>RDMA link is ready. Otherwise, SMC should always try its best to degrade to
>TCP. From this perspective, SMC is a protocol derived from TCP and can also
>fallback to TCP, It should be considered as part of the same protocol
>family as TCP (AF_INET and AF_INET6).
>
>Compatibility & Scalability:
>
>Due to the presence of fallback, we needs to handle it very carefully to
>keep the consistent with the TCP sockets. SMC has done a lot of work to
>ensure that, but still, there are quite a few issues left, such as:
>
>1. The "ss" command cannot display the process name and ID associated with
>the fallback socket.
>
>2. The linger option is ineffective when user try’s to close the fallback
>socket.
>
>3. Some eBPF attach points related to INET_SOCK are ineffective under
>fallback socket, such as BPF_CGROUP_INET_SOCK_RELEASE.
>
>4. SO_PEEK_OFF is a un-supported sock option for fallback sockets, while
>it’s of course supported for tcp sockets.
>
>Of course, we can fix each issue one by one, but it is not a fundamental
>solution. Any changes on the inet path may require re-synchronization,
>including bug fixes, security fixes, tracing, new features and more. For
>example, there is a commit which we think is very valueable:
>
>commit 0dd061a6a115 ("bpf: Add update_socket_protocol hook")
>
>This commit allows users to modify dynamically the protocol before socket
>created through eBPF programs, which provides a more flexible approach
>than smc_run (LP_PRELOAD). It does not require the process restart
>and allows for controlling replacement at the connection level, whereas
>smc_run operates at the process level.
>
>However, to benefit from it under the SMC path requires additional
>code submission while nothing changes requires to do under inet path.
>
>I'm not saying that these issues cannot be fixed under smc path, however,
>the solution for these issues often involves duplicating work that already
>done on inet path. Thats to say, if we can be under the inet path, we can
>easily reuse the existing infrastructure.
>
>Performance:
>
>In order to ensure consistency between fallback sockets and TCP sockets,
>SMC creates an additional TCP socket. This introduces additional overhead
>of approximately 15%-20% for the establishment and destruction of fallback
>sockets. In fact, for the users we have contacted who have shown interest
>in SMC, ensuring consistency in performance between fallback and TCP has
>always been their top priority. Since no one can guarantee the
>availability of RDMA links, support for SMC on both sides, or if the
>user's environment is 100% suitable for SMC. Fallback is the only way to
>address those issues, but the additional performance overhead is
>unacceptable, as fallback cannot provide the benefits of RDMA and only
>brings burden right now.
>
>In inet path, we can embed TCP sock into SMC sock, when fallback occurs,
>the socket behaves exactly like a TCP socket. In our POC, the performance
>of fallback socket under inet path is almost indistinguishable from of
>tcp socket, with less than 1% loss. Additionally, and more importantly,
>it has full feature compatibility with TCP socket.
>
>Of course, it is also possible under smc path, but in that way, it
>would require a significant amount of work to ensure compatibility with
>tcp sockets, which most of them has already been done in inet path.
>And still, any changes in inet path may require re-synchronization.
>
>I also noticed that there have been some discussions on this issue before.
>
>Link: https://lore.kernel.org/stable/4a873ea1-ba83-1506-9172-e955d5f9ae16@xxxxxxxxxx/
>
>And I saw some supportive opinions here, maybe it is time to continue
>discussing this matter now.
>
>Signed-off-by: D. Wythe <alibuda@xxxxxxxxxxxxxxxxx>
>---
> include/uapi/linux/in.h | 2 ++
> 1 file changed, 2 insertions(+)
>
>diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
>index e682ab6..0c6322b 100644
>--- a/include/uapi/linux/in.h
>+++ b/include/uapi/linux/in.h
>@@ -83,6 +83,8 @@ enum {
> #define IPPROTO_RAW		IPPROTO_RAW
>   IPPROTO_MPTCP = 262,		/* Multipath TCP connection		*/
> #define IPPROTO_MPTCP		IPPROTO_MPTCP
>+  IPPROTO_SMC = 263,		/* Shared Memory Communications		*/
>+#define IPPROTO_SMC		IPPROTO_SMC

I think adding a new IPPROTO_SMC protocol is good, but we need to make
sure this won't break AF_SMC.


Best regards,
Dust



>   IPPROTO_MAX
> };
> #endif
>-- 
>1.8.3.1



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux