Re: v4 clientid uniquifiers in containers/namespaces

"Benjamin Coddington" <bcodding@xxxxxxxxxx> · Tue, 08 Feb 2022 06:32:09 -0500

On 7 Feb 2022, at 18:59, Chuck Lever III wrote:

On Feb 7, 2022, at 2:38 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> 
wrote:

On Mon, 2022-02-07 at 15:49 +0000, Chuck Lever III wrote:

On Feb 7, 2022, at 9:05 AM, Benjamin Coddington
<bcodding@xxxxxxxxxx> wrote:

On 5 Feb 2022, at 14:50, Benjamin Coddington wrote:

On 5 Feb 2022, at 13:24, Trond Myklebust wrote:

On Sat, 2022-02-05 at 10:03 -0500, Benjamin Coddington wrote:
Hi all,

Is anyone using a udev(-like) implementation with
NETLINK_LISTEN_ALL_NSID?
It looks like that is at least necessary to allow the init
namespaced
udev
to receive notifications on
/sys/fs/nfs/net/nfs_client/identifier,
which
would be a pre-req to automatically uniquify in containers.

I'md interested since it will inform whether I need to send
patches
to
systemd's udev, and potentially open the can of worms over
there.
Yet its
not yet clear to me how an init namespaced udev process can
write to
a netns
sysfs path.

Another option might be to create yet another daemon/tool
that would
listen
specifically for these notifications.  Ugh.

Ben

I don't understand. Why do you need a new daemon/tool?

Because what we've got only works for the init namespace.

Udev won't get kobject notifications because its not using
NETLINK_LISTEN_ALL_NSIDs.

We need to figure out if we want:

1) the init namespace udevd to handle all client_id uniquifiers
2) we expect network namespaces to run their own udevd
3) or both.

I think 2 violates "least surprise", and 3 might not be something
anyone
ever wants.  If they do, we can fix it at that point.

So to make 1 work, we can try to change udevd, or maybe just
hacking about
with nfs_netns_object_child_ns_type will be sufficient.

I agree that 1 seems like the preferred approach, though
I don't have a technical suggestion at this point.

I strongly disagree. (1) requires the init namespace to have intimate
knowledge of container internals.

Not really, we're just distinguishing NFS clients in containers from NFS
clients on the host.  That doesn't require intimate knowledge, only a
mechanism to create a unique value per-container.

Why do we need to make that a requirement? That violates the 
expectation
that containers are stateless by default, and also the expectation 
that
they operate independently of the environment.

I'm not familiar with the expectation that containers are stateless by
default, or that they operate independently of the environment.

If you really do want external control over the uuid that is set, 
then
it should be pretty trivial to do so by using the standard container
tools for manipulating the namespace (e.g. to mount a file that is
under control of the parent as /etc/nfs4-uuid.conf or whatever).

We're not looking for external control, just automation.  The NFS 
community
has decided that udev is the way to go here, so as long as we can get 
the
notifications to /some/ udev process, I feel confident we can make all 
of
this transparent.

The less we have to teach all the container tooling folks, the better 
for us.

However in most cases that I can think of, if the container is doing
its own NFS mounting, then it is going to have to be set up with its
own nfs-utils, etc, so there is no reason why we can't also require
udev.

I'm not as confident about this as you are.  Network namespaces are 
pretty
useful on their own to create independent network configurations or to
isolate hardware interfaces.  We've had a few surprising cases of 
customers
using them in creative ways.

There's a bit of a chicken and egg problem with 2, though.  If the nfs
module is loaded, the kernel notification gets sent as soon as you 
create
the namespace.  Its not going to wait for you to move or exec udev into 
that
network namespace, and the notification is lost.

Can't we just uniquify the namespaced NFS client ourselves, while still
exposing /sys/fs/nfs/net/nfs_client/identifier within the namespace?  
That
way if someone want to run udev or use their own method of persistent id
its available to them within the container so they can.  Then we can 
move
forward because the problem of distinguishing clients between the host 
and
netns is automagically solved.

Where we are today is the host NFS client is uniquified, and all the 
netns
clients are distinguished from the host, but not eachother.

Ben