Re: [PATCH] nsfs: fix oops when ns->ops is not provided

Cong Wang <xiyou.wangcong@xxxxxxxxx> · Thu, 10 Jun 2021 17:14:24 -0700

On Mon, Jun 7, 2021 at 2:08 AM Christian Brauner
<christian.brauner@xxxxxxxxxx> wrote:
>
> On Sun, Jun 06, 2021 at 05:37:40PM -0700, Cong Wang wrote:
> > On Fri, Jun 4, 2021 at 2:54 AM Christian Brauner
> > <christian.brauner@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Jun 03, 2021 at 03:52:29PM -0700, Cong Wang wrote:
> > > > On Wed, Jun 2, 2021 at 2:14 AM Christian Brauner
> > > > <christian.brauner@xxxxxxxxxx> wrote:
> > > > > But the point is that ns->ops should never be accessed when that
> > > > > namespace type is disabled. Or in other words, the bug is that something
> > > > > in netns makes use of namespace features when they are disabled. If we
> > > > > handle ->ops being NULL we might be tapering over a real bug somewhere.
> > > >
> > > > It is merely a protocol between fs/nsfs.c and other namespace users,
> > > > so there is certainly no right or wrong here, the only question is which
> > > > one is better.
> > > >
> > > > >
> > > > > Jakub's proposal in the other mail makes sense and falls in line with
> > > > > how the rest of the netns getters are implemented. For example
> > > > > get_net_ns_fd_fd():
> > > >
> > > > It does not make any sense to me. get_net_ns() merely increases
> > > > the netns refcount, which is certainly fine for init_net too, no matter
> > > > CONFIG_NET_NS is enabled or disabled. Returning EOPNOTSUPP
> > > > there is literally saying we do not support increasing init_net refcount,
> > > > which is of course false.
> > > >
> > > > > struct net *get_net_ns_by_fd(int fd)
> > > > > {
> > > > >         return ERR_PTR(-EINVAL);
> > > > > }
> > > >
> > > > There is a huge difference between just increasing netns refcount
> > > > and retrieving it by fd, right? I have no idea why you bring this up,
> > > > calling them getters is missing their difference.
> > >
> > > This argument doesn't hold up. All netns helpers ultimately increase the
> > > reference count of the net namespace they find. And if any of them
> > > perform operations where they are called in environments wherey they
> > > need CONFIG_NET_NS they handle this case at compile time.
> >
> > Let me explain it in this more straight way: what is the protocol here
> > for indication of !CONFIG_XXX_NS? Clearly it must be ns->ops==NULL,
> > because all namespaces use the following similar pattern:
> >
> > #ifdef CONFIG_NET_NS
> >         net->ns.ops = &netns_operations;
> > #endif
> >
> > Now you are arguing the protocol is not this, but it is the getter of
> > open_related_ns() returns an error pointer.
>
> I don't understand what this is supposed to tell me.

This tells you whatever you called a bug, it is just a protocol.

You are trying to justify it as bug by interpreting is as a getter
like get_net_ns_by_fd(). None of them makes sense, neither
is this bug, nor it is any similar to get_net_ns_by_fd().

>
> >
> > >
> > > (Pluse they are defined in a central place in net/net_namespace.{c,h}.
> > > That includes the low-level get_net() function and all the others.
> > > get_net_ns() is the only one that's defined out of band. So get_net_ns()
> > > currently is arguably also misplaced.)
> >
> > Of course they do, only struct ns_common is generic. What's your
> > point? Each ns.ops is defined by each namespace too.
>
> All netns helpers should arguably be located in a central place
> including get_net_ns(). There's no need to spread such helpers
> everywhere. This is completely orthogonaly to struct ns_common.

I have no idea why you want to argue on something I don't disagree
with. Actually, the proposal from me only changes fs/nsfs.c, so you
do not even need to worry about file locations at all.

>
> >
> > >
> > > The problem I have with fixing this in nsfs is that it gives the
> > > impression that this is a bug in nsfs whereas it isn't and it
> > > potentially helps tapering over other bugs.
> >
> > Like I keep saying, this is just a protocol, there is no right or
> > wrong here. If the protocol is just ops==NULL, then there is nothing
> > wrong use it.
> >
> > (BTW, we have a lot of places that use ops==NULL as a protocol,
> > they work really well.)
> >
> > >
> > > get_net_ns() is only called for codepaths that call into nsfs via
> > > open_related_ns() and it's the only namespace that does this. But
> >
> > I am pretty sure userns does the same:
> >
> > 197         case NS_GET_USERNS:
> > 198                 return open_related_ns(ns, ns_get_owner);
>
> Maybe I wasn't clear enough, open_related_ns() is the only namespace
> that calls into nsfs via open_related_ns() __outside__ of fs/nsfs.c I
> thought that was pretty clear.

Why it matter which calls open_related_ns() here? The point is
ns_get_owner() and get_net_ns() are defined by each ns, IOW,
outside of fs/nsfs.c.

>
> But also...
>
> #ifdef CONFIG_USER_NS
> struct ns_common *ns_get_owner(struct ns_common *ns);
> #else
> static inline struct ns_common *ns_get_owner(struct ns_common *ns)
> {
>         return ERR_PTR(-EPERM);
> }
> #endif
>
> So ns_get_owner() returns -EPERM when !CONFIG_USER_NS so the callback
> handles the !CONFIG_USER_NS case. And that's what we were saying
> get_net_ns() should do.

Sure, thanks for pointing this out. This is unnecessary too, like I said,
if the protocol is simply ns.ops==NULL. No one says the current code
is perfect, all code can be improved, so using existing code can't justify
it.

>
> >
> >
> > > open_related_ns() is only well defined if CONFIG_<NAMESPACE_TYPE> is
> > > set. For example, none of the procfs namespace f_ops will be set for
> > > !CONFIG_NET_NS. So clearly the socket specific getter here is buggy as
> > > it doesn't account for !CONFIG_NET_NS and it should be fixed.
> >
> > If the protocol is just ops==NULL, then the core part should just check
> > ops==NULL. Pure and simple. I have no idea why you do not admit the
> > fact that every namespace intentionally leaves ops as NULL when its
> > config is disabled.
>
> I'm just going to quote myself:
>
> > > set. For example, none of the procfs namespace f_ops will be set for
> > > !CONFIG_NET_NS.
>
> If a given namespace type isn't selected then it will never appear in
> /proc/<pid>/ns/* which is why the proc_ns_operations aren't defined in
> fs/proc/namespaces.c.
>
> In other words, you can't get a file descriptor for a given namespace
> through proc or rather the nsfs part of proc when that namespace type
> isn't selected.

Who said open_related_ns() should return a fd in such a case?
Clearly it must return an error here.

>
> The open_related_ns() function is a function that is just there to give
> you a namespace fd and it assumes that when it is called that the
> namespace type is selected for or that the callback you're passing it
> handles that case.

Once again, this is just a protocol. Let me compare your protocol
with mine:

1. You want to use the getter as a protocol for indicating a ns is
disabled;

2. I prefer to use ns.ops==NULL as a protocol here.

And let me explain why 2) is better than 1):

a) The final code is less with 2), because all those ugly #ifdef's
are all gone. The getter would not even be called if ns.ops==NULL.

b) The code is more readable. The point of the getter is to increase
refcnt of a specific ns. There is nothing wrong to at least literally
increase the refcnt of init_net. With your approach, it simply says
getting init_net is not supported if !CONFIG_NET_NS, this is just
false.

c) It is slightly faster to error out. open_related_ns() would return
an error even before get_unused_fd_flags(). With your approach,
it defers to the getter, that is, after get_unused_fd_flags().

>
> For example, see you're own example about ns_get_owner() above.
>
> >
> > >
> > > Plus your fix leaks references to init netns without fixing get_net_ns()
> > > too.
> >
> > I thought it is 100% clear that this patch is not from me?
> >
> > Plus, the PoC patch from me actually suggests to change
> > open_related_ns(), not __ns_get_path(). I have no idea why you
> > both miss it.
>
> Turning this around, I'm not sure what your resistance to just doing it
> like ns_get_owner() is doing it is.

I have the same doubt. See above on why.

Thanks.