RE: [PATCH] net/smc: mark as BROKEN due to remote memory exposure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Doug,

> -----Original Message-----
> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Doug Ledford
> Sent: Wednesday, May 17, 2017 3:38 PM
> To: David Miller <davem@xxxxxxxxxxxxx>
> Cc: Bart.VanAssche@xxxxxxxxxxx; torvalds@xxxxxxxxxxxxxxxxxxxx;
> hch@xxxxxx; netdev@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx;
> stable@xxxxxxxxxxxxxxx; ubraun@xxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH] net/smc: mark as BROKEN due to remote memory
> exposure
> 
> On Tue, 2017-05-16 at 15:28 -0400, Doug Ledford wrote:
> > I hadn't realized EXPERIMENTAL was gone.  Which is too bad, because
> > that's entirely appropriate in this case, and would have had the
> > desired side effect of keeping it out of any non-cutting edge distros
> > and warning people of possible API changes.  With EXPERIMENTAL gone,
> > the closest thing we have is drivers/staging, since that tends to
> > imply some of the same consequences.  I know you think BROKEN is
> > overly harsh, but I'm not sure we should just do nothing.  How about
> > we take a few days to let some of the RDMA people closely review the
> > 143 page
> > (egads!) rfc (http://www.rfc-editor.org/info/rfc7609) to see if we
> > think it can be fixed to use multiple link layers with the existing
> > API in place or if it will require something other than AF_SMC.  If we
> > need to break API, then I think we should either fix it ASAP and send
> > that fix to the 4.11 stable series (which probably violates the
> > normative stable patch size/scope) or if the fix will take longer than
> > this kernel cycle, then move it to staging both here and in 4.11
> > stable, and fix it there and then move it back.  Something like that
> > would prevent the kind of API flappage we ought not do....
> 
> So, I've skimmed the entire RFC, focusing on things were I needed to.
>  Here's my take on it:
> 
> It would have been better with AF_INET/AF_INET6 and an option to enable
> SMC than AF_SMC.  The first implementation simply assumes AF_INET in
> the presence of AF_SMC.  When IPv6 support is added, some sort of
> guessing logic will have to be put in place to try and determine if an
> AF_SMC address is actually AF_INET or AF_INET6 since we won't have a
> guaranteed way of telling.  Apps can use struct sockaddr_storage as their
> normal element to stick the address into, and could rely on the kernel to
> interpret it properly based on the AF_INET/AF_INET6 differentiation, and
> this breaks that.  The RFC gives *some* thought to adding IPv6 in the
> future, but not a lot.  It may be that the answer is that in the future, IPv6
> support is enabled by making the IPv6 API be
> AF_INET6 + setsockopt(SMC) or the equivalent.  If that's the case, then I
> would suggest making the later API specifically call out AF_INET +
> setsockopt(SMC) be identical to AF_SMC.
> 

What are the shortcomings in my proposal [1] which I am reiterating below.
Bart also suggested to define new stream protocol for SMC similar to SCTP.

(a) address family should be either AF_INET or AF_INET6
(b) socket() API to introduce new protocol as PROTO_SMC for in socket() system call.

With this there is no additional setsockop() needed.

With this - user space applications, do getaddrinfo() with hint as
hints.ai_family  = AF_INET;
hints.ai_socktype = SOCK_STREAM;

getaddrinfo() returns back both the protocols TCP and SMC.
Famous database application such as Redis client iterates over all entries of getaddrinfo() and establishes connection to servers.

There are few advantages of this interface.
1. No change in any makefile of applications needed - unless user wants to specify explicitly that it wants to force SMC protocol.
2. No need to do LD_LIBRARY_PRELOAD, (which won't work anyway because bind() connect() of SMC checks for AF_INET).
3. No major changes to glibc to process AF_SMC differently
glibc references IPPROTO_TCP at 22 places. (compare to AF_INET at 140+ places).
A lot simpler test matrix for glibc for new protocol
5. No need to recompile applications, as long as getaddrinfo returns all streaming protocols (TCP, SMC)
6. for applications to make use of setsockopt() it needs another knob and hint from other places, which can be avoided because SMC TCP option negotiates with remote end

And representing new protocol as new protocol for a given address family appears correct, compare to setting socket options.

Tools like CRIU or similar tool might find a race conditions - when it queries socket option, SMC was not set, but later on SMC was set, and does incorrect handling.
Setting socket() with SMC protocol makes it easier to understand in this area as well.

I have additional proposal for link groups, resource creation area. I will take that up after this discussion.

[1] https://patchwork.kernel.org/patch/9719375/

> The protocol included a version header in the negotation messages.
>  This is good as it allows us to almost immediately start work on version 2
> that fixes the shortcomings of version 1 while maintaining back
> compatibility.
> 
> After reading the RFC, I can see why they only support one link layer:
> RoCE.  The issue here is that they punted on the hard issue of doing any sort
> of work to determine if security restrictions on the TCP connection should
> also be applied to the RDMA connection.  The RFC basically says "the RoCE
> link needs to have the same physical restrictions (vlan) as the TCP/IP link so
> that the security limitations are the same" because they don't do anything
> to check them essentially.
>  For v2 of the protocol, and for different link layer support, this is no longer
> sufficient, so there will be significant work to determine the security context
> of specific TCP connections and then make sure that they meet the security
> context of the RDMA links allowed.
> 
> Additionally, the same is likely to be true in terms of SELinux options.  The
> addition of the IB/OPA link layers will throw a bit of a monkey wrench in
> things because the SELinux control over those links is slightly different than
> a normal TCP/IP SELinux control.  For instance, you might create rules about
> processes and ports to make sure that the httpd daemon can only access
> specific ports on TCP/IP, but on IB/OPA you would need to create process
> and P_Key rules as IB/OPA don't have the same port level semantics, and
> it's the P_Key on communications that is enforced network wide, including
> in the switches, so controlling what P_Key a process can send/receive on is
> your best way to limit what a process can do.  Translation from one to the
> other might be rather difficult to do in any sort of automated fashion.
> 
> There might have to be some additional work to get this to properly
> account items for the RDMA control group elements that were recently
> taken into the kernel.
> 
> Finally, the RDMA subsystem is in the process of switching to structured
> option processing similar to how netlink does it.  For version 2 of this
> protocol, since it will be interacting with the RDMA core in many ways, will
> be simpler if it switches the on-wire negotiation packets to follow the same
> methods as that will allow reuse of code that it will have to have for
> properly dealing with the RDMA subsystem in the future.
> 
> So, I'm fine with it being left as is since you accepted the patch that makes
> note of the memory registration insecurity in the Kconfig text.
>  The fact that this is a versioned protocol means that we can fix the things
> we see as being wrong without having to have it all right from the very
> start, it can be done incrementally.
> 
> --
> Doug Ledford <dledford@xxxxxxxxxx>
>     GPG KeyID: B826A3330E572FDD
> 
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info
> at  http://vger.kernel.org/majordomo-info.html
��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux