Re: msgr2 and NAT

Sebastien Han <shan@xxxxxxxxxx> · Fri, 8 Feb 2019 12:14:32 +0100

I believe most of the containerized (docker/kubernetes) deployments
rely on NAT (doing bridges), other CNI might do things differently but
I'm concerned to lose the ability to NAT for these environments.

Thanks!
–––––––––
Sébastien Han
Principal Software Engineer, Storage Architect

"Always give 100%. Unless you're giving blood."

On Tue, Jan 29, 2019 at 3:13 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>
> On Mon, Jan 28, 2019 at 4:00 AM Sage Weil <sweil@xxxxxxxxxx> wrote:
> >
> > msgr1 has some super cludgey behavior that's used to detect what IP
> > address the client or daemon should identify as.  If the daemon hasn't
> > explicitly binded to a specific IP (i.e., it bound to 0.0.0.0, [::], or
> > didn't bind at all) then the first time it connects to another peer the
> > peer will send the IP we appear to be connecting from in the initial
> > banner.
> >
> > It seems to have worked out mostly okay, but it's definitely a bit weird.
> > The first connection is always to the monitor, so this means that the IP
> > that an OSD or MDS daemon uses is always the one that on the same network
> > as the monitor (or whichever IP the kernel decides to use to route to it).
> >
> > A side-effect of this is that, in theory, a client that is behind NAT
> > could connect to a ceph cluster.  It will end up being identified by the
> > NATed IP that the cluster sees and the random 64-bit nonce.
> >
> > Note that (to my knowledge) this has never been tested, so it only
> > theoretically works..
> >
> > The initial msgr2 implementation simplifies this by instead calling
> > getsockname(2) on the first outgoing connection to see what IP we're
> > connecting from.  That removes the weird dependency on the other end tell
> > us who we are, but it means that NAT won't work.
> >
> > So... should we try to make the NAT scenario work in msgr2?
> >
> > We can do it with a minor-ish change to have the accepting end share our
> > apparent IP sooner in teh exchange (probably after the initial banner).
> > (The current code shares it as part of the server_ident, but that's too
> > late in the exchange to serve the same role it did in msgr1.)
> >
> > sage
>
> I know RBD has some test cases where is runs librbd clients *within* a
> QEMU VM via the built-in host NATing (e.g. OpenStack devstack test
> cases). Is there some reason why the OSDs care about the real IP
> address of the client?
>
>
> --
> Jason