On Fri, 7 Jun 2024, Matthieu Baerts wrote:
Hi D.Wythe,
On 07/06/2024 07:09, D. Wythe wrote:
On 6/7/24 5:22 AM, Mat Martineau wrote:
On Wed, 5 Jun 2024, D. Wythe wrote:
From: "D. Wythe" <alibuda@xxxxxxxxxxxxxxxxx>
This patch allows to create smc socket via AF_INET,
similar to the following code,
/* create v4 smc sock */
v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);
/* create v6 smc sock */
v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);
There are several reasons why we believe it is appropriate here:
1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
address. There is no AF_SMC address at all.
2. Create smc socket in the AF_INET(6) path, which allows us to reuse
the infrastructure of AF_INET(6) path, such as common ebpf hooks.
Otherwise, smc have to implement it again in AF_SMC path.
Signed-off-by: D. Wythe <alibuda@xxxxxxxxxxxxxxxxx>
Tested-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>
---
include/uapi/linux/in.h | 2 +
net/smc/Makefile | 2 +-
net/smc/af_smc.c | 16 ++++-
net/smc/smc_inet.c | 169 +++++++++++++++++++++++++++++++++++++++
+++++++++
net/smc/smc_inet.h | 22 +++++++
5 files changed, 208 insertions(+), 3 deletions(-)
create mode 100644 net/smc/smc_inet.c
create mode 100644 net/smc/smc_inet.h
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index e682ab6..0c6322b 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -83,6 +83,8 @@ enum {
#define IPPROTO_RAW IPPROTO_RAW
IPPROTO_MPTCP = 262, /* Multipath TCP connection */
#define IPPROTO_MPTCP IPPROTO_MPTCP
+ IPPROTO_SMC = 263, /* Shared Memory Communications */
+#define IPPROTO_SMC IPPROTO_SMC
Hello,
It's not required to assign IPPROTO_MPTCP+1 as your new IPPROTO_SMC
value. Making IPPROTO_MAX larger does increase the size of the
inet_diag_table. Values from 256 to 261 are usable for IPPROTO_SMC
without increasing IPPROTO_MAX.
Just for background: When we added IPPROTO_MPTCP, we chose 262 because
it is IPPROTO_TCP+0x100. The IANA reserved protocol numbers are 8 bits
wide so we knew we would not conflict with any future additions, and
in the case of MPTCP is was convenient that truncating the proto value
to 8 bits would match IPPROTO_TCP.
- Mat
Hi Mat,
Thank you very much for your feedback, I have always been curious about
the origins of IPPROTO_MPTCP and I am glad to
have learned new knowledge.
Hi D. Whythe -
Sure, you're welcome!
Regarding the size issue of inet_diag_tables, what you said does make
sense. However, we still hope to continue using 263,
although the rationale may not be fully sufficient, as this series has
been under community evaluation for quite some time now,
and we haven't received any feedback about this value, so we’ve been
using it in some user-space tools ... 🙁
It's definitely a tradeoff between the Linux UAPI that gets locked in
forever vs. handling a transition with your userspace tools. If you change
the numeric value of IPPROTO_SMC on the open source side you could
transition internally by carrying a kernel patch that allows both the new
and old value.
I would like to see what the community thinks. If everyone agrees that
using 263 will be completely unacceptable and a disaster,
then we will have no choice but to change it.
It will not be a disaster, but a small waste of space (even if
CONFIG_SMC is not set).
Well stated Matthieu :) I chose my "not required" wording carefully, as I
didn't want to demand a change here but to make you aware of some of the
tradeoffs to consider. And thankfully Matthieu remembered the userspace
issues below.
Also, I see that one of the netdev maintainers flagged this v6 series as
"changes requested" in patchwork so that may indicate their preference?
Also, please note that the introduction of IPPROTO_MPTCP caused some
troubles in some userspace programs. That was mainly because IPPROTO_MAX
got updated, and they didn't expect that, e.g. a quick search on GitHub
gave me this:
https://github.com/systemd/systemd/issues/15604
https://github.com/strace/strace/issues/164
https://github.com/rust-lang/libc/issues/1896
I guess these userspace programs should now be ready for a new update,
but still, it might be better to avoid that if there is a "simple" solution.
I understand changing your userspace tools will be annoying. (On the
other hand, it is still time to do that :) )
Agreed!
- Mat