Re: [PATCH net-next v6 3/3] net/smc: Introduce IPPROTO_SMC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 7 Jun 2024, Matthieu Baerts wrote:

Hi D.Wythe,

On 07/06/2024 07:09, D. Wythe wrote:

On 6/7/24 5:22 AM, Mat Martineau wrote:
On Wed, 5 Jun 2024, D. Wythe wrote:

From: "D. Wythe" <alibuda@xxxxxxxxxxxxxxxxx>

This patch allows to create smc socket via AF_INET,
similar to the following code,

/* create v4 smc sock */
v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC);

/* create v6 smc sock */
v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC);

There are several reasons why we believe it is appropriate here:

1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6)
address. There is no AF_SMC address at all.

2. Create smc socket in the AF_INET(6) path, which allows us to reuse
the infrastructure of AF_INET(6) path, such as common ebpf hooks.
Otherwise, smc have to implement it again in AF_SMC path.

Signed-off-by: D. Wythe <alibuda@xxxxxxxxxxxxxxxxx>
Tested-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>
---
include/uapi/linux/in.h |   2 +
net/smc/Makefile        |   2 +-
net/smc/af_smc.c        |  16 ++++-
net/smc/smc_inet.c      | 169 +++++++++++++++++++++++++++++++++++++++
+++++++++
net/smc/smc_inet.h      |  22 +++++++
5 files changed, 208 insertions(+), 3 deletions(-)
create mode 100644 net/smc/smc_inet.c
create mode 100644 net/smc/smc_inet.h

diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index e682ab6..0c6322b 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -83,6 +83,8 @@ enum {
#define IPPROTO_RAW        IPPROTO_RAW
  IPPROTO_MPTCP = 262,        /* Multipath TCP connection */
#define IPPROTO_MPTCP        IPPROTO_MPTCP
+  IPPROTO_SMC = 263,        /* Shared Memory Communications        */
+#define IPPROTO_SMC        IPPROTO_SMC

Hello,

It's not required to assign IPPROTO_MPTCP+1 as your new IPPROTO_SMC
value. Making IPPROTO_MAX larger does increase the size of the
inet_diag_table. Values from 256 to 261 are usable for IPPROTO_SMC
without increasing IPPROTO_MAX.

Just for background: When we added IPPROTO_MPTCP, we chose 262 because
it is IPPROTO_TCP+0x100. The IANA reserved protocol numbers are 8 bits
wide so we knew we would not conflict with any future additions, and
in the case of MPTCP is was convenient that truncating the proto value
to 8 bits would match IPPROTO_TCP.

- Mat


Hi Mat,

Thank you very much for your feedback, I have always been curious about
the origins of IPPROTO_MPTCP and I am glad to
have learned new knowledge.


Hi D. Whythe -

Sure, you're welcome!

Regarding the size issue of inet_diag_tables, what you said does make
sense. However, we still hope to continue using 263,
although the rationale may not be fully sufficient, as this series has
been under community evaluation for quite some time now,
and we haven't received any feedback about this value, so we’ve been
using it in some user-space tools ... 🙁


It's definitely a tradeoff between the Linux UAPI that gets locked in forever vs. handling a transition with your userspace tools. If you change the numeric value of IPPROTO_SMC on the open source side you could transition internally by carrying a kernel patch that allows both the new and old value.

I would like to see what the community thinks. If everyone agrees that
using 263 will be completely unacceptable and a disaster,
then we will have no choice but to change it.

It will not be a disaster, but a small waste of space (even if
CONFIG_SMC is not set).

Well stated Matthieu :) I chose my "not required" wording carefully, as I didn't want to demand a change here but to make you aware of some of the tradeoffs to consider. And thankfully Matthieu remembered the userspace issues below.

Also, I see that one of the netdev maintainers flagged this v6 series as "changes requested" in patchwork so that may indicate their preference?


Also, please note that the introduction of IPPROTO_MPTCP caused some
troubles in some userspace programs. That was mainly because IPPROTO_MAX
got updated, and they didn't expect that, e.g. a quick search on GitHub
gave me this:

 https://github.com/systemd/systemd/issues/15604
 https://github.com/strace/strace/issues/164
 https://github.com/rust-lang/libc/issues/1896

I guess these userspace programs should now be ready for a new update,
but still, it might be better to avoid that if there is a "simple" solution.

I understand changing your userspace tools will be annoying. (On the
other hand, it is still time to do that :) )

Agreed!


- Mat

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux