Re: v4 clientid uniquifiers in containers/namespaces

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Mon, 7 Feb 2022 23:59:03 +0000

> On Feb 7, 2022, at 2:38 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Mon, 2022-02-07 at 15:49 +0000, Chuck Lever III wrote:
>> 
>> 
>>> On Feb 7, 2022, at 9:05 AM, Benjamin Coddington
>>> <bcodding@xxxxxxxxxx> wrote:
>>> 
>>> On 5 Feb 2022, at 14:50, Benjamin Coddington wrote:
>>> 
>>>> On 5 Feb 2022, at 13:24, Trond Myklebust wrote:
>>>> 
>>>>> On Sat, 2022-02-05 at 10:03 -0500, Benjamin Coddington wrote:
>>>>>> Hi all,
>>>>>> 
>>>>>> Is anyone using a udev(-like) implementation with
>>>>>> NETLINK_LISTEN_ALL_NSID?
>>>>>> It looks like that is at least necessary to allow the init
>>>>>> namespaced
>>>>>> udev
>>>>>> to receive notifications on
>>>>>> /sys/fs/nfs/net/nfs_client/identifier,
>>>>>> which
>>>>>> would be a pre-req to automatically uniquify in containers.
>>>>>> 
>>>>>> I'md interested since it will inform whether I need to send
>>>>>> patches
>>>>>> to
>>>>>> systemd's udev, and potentially open the can of worms over
>>>>>> there. 
>>>>>> Yet its
>>>>>> not yet clear to me how an init namespaced udev process can
>>>>>> write to
>>>>>> a netns
>>>>>> sysfs path.
>>>>>> 
>>>>>> Another option might be to create yet another daemon/tool
>>>>>> that would
>>>>>> listen
>>>>>> specifically for these notifications.  Ugh.
>>>>>> 
>>>>>> Ben
>>>>>> 
>>>>> 
>>>>> I don't understand. Why do you need a new daemon/tool?
>>> 
>>> Because what we've got only works for the init namespace.
>>> 
>>> Udev won't get kobject notifications because its not using
>>> NETLINK_LISTEN_ALL_NSIDs.
>>> 
>>> We need to figure out if we want:
>>> 
>>> 1) the init namespace udevd to handle all client_id uniquifiers
>>> 2) we expect network namespaces to run their own udevd
>>> 3) or both.
>>> 
>>> I think 2 violates "least surprise", and 3 might not be something
>>> anyone
>>> ever wants.  If they do, we can fix it at that point.
>>> 
>>> So to make 1 work, we can try to change udevd, or maybe just
>>> hacking about
>>> with nfs_netns_object_child_ns_type will be sufficient.
>> 
>> I agree that 1 seems like the preferred approach, though
>> I don't have a technical suggestion at this point.
>> 
> 
> I strongly disagree. (1) requires the init namespace to have intimate
> knowledge of container internals. Why do we need to make that a
> requirement? That violates the expectation that containers are
> stateless by default, and also the expectation that they operate
> independently of the environment.
> 
> If you really do want external control over the uuid that is set, then
> it should be pretty trivial to do so by using the standard container
> tools for manipulating the namespace (e.g. to mount a file that is
> under control of the parent as /etc/nfs4-uuid.conf or whatever).
> 
> However in most cases that I can think of, if the container is doing
> its own NFS mounting, then it is going to have to be set up with its
> own nfs-utils, etc, so there is no reason why we can't also require
> udev.

What Ben described in 1. more closely aligned with how I thought
containers work today.

But it could be that 2. gives the ability to migrate the guest
container to another physical host and take its nfs4_unique_id
with it.

I don't have a strong preference between the two. I'm in favor
of doing whichever gets us to "done" faster.

--
Chuck Lever