Thank you for great comments. Thanks to you I understand what i missed. I try to generate problem on mainline But, as you explained that situation isn't happen, Maybe my other things which I made generate some problem (freeing network namespace..) Thanks for great answering and sharing. If I meet the situation, at that time I'll share. Thank you very much! P.S. If I have a question, Could I ask via e-mail like this? On Wed, Apr 8, 2020 at 3:26 AM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Tue, Apr 07, 2020 at 09:53:29PM +0900, Yun Levi wrote: > > BTW, It's my question. > > > > >Look: we call ns_get_path(), which calls ns_get_path_cb(), which > > >calls the callback passed to it (ns_get_path_task(), in this case), > > >which grabs a reference to ns. Then we pass that reference to > > >__ns_get_path(). > > > > > >__ns_get_path() looks for dentry stashed in ns. If there is one > > >and it's still alive, we grab a reference to that dentry and drop > > >the reference to ns - no new inodes had been created, so no new > > >namespace references have appeared. Existing inode is pinned > > >by dentry and dentry is pinned by _dentry_ reference we've got. > > > > actually ns_get_path is called in unshare(2). > > Yes, it does. Via perf_event_namespaces(), which does > perf_fill_ns_link_info(&ns_link_info[NET_NS_INDEX], > task, &netns_operations); > and there we have > error = ns_get_path(&ns_path, task, ns_ops); > if (!error) { > ns_inode = ns_path.dentry->d_inode; > ns_link_info->dev = new_encode_dev(ns_inode->i_sb->s_dev); > ns_link_info->ino = ns_inode->i_ino; > path_put(&ns_path); > } > See that path_put()? Dentry reference is dropped by it. > > > and it makes new dentry and > > inode in __ns_get_path finally (Cuz it create new network namespace) > > > > In that case, when I mount with --bind option to this proc/self/ns/net, it > > only increase dentry refcount on do_loopback->clone_mnt finally (not call > > netns_operation->get) > > That means it's not increase previous created network namespace reference > > count but only increase reference count of _dentry_ > > > > In that situation, If I exit the child process it definitely frees the > > net_namespace previous created at the same time it decrease net_namespace's > > refcnt in exit_task_namespace(). > > It means it's possible that bind mount point can hold the dentry with inode > > having net_namespace's dangling pointer in another process. > > In above situation, parent who know that binded mount point calls setns(2) > > then it sets the net_namespace which is refered by the inode of the dentry > > increased by do_loopback. > > That makes set the net_namespace which was freed already. > > How? Netns reference in inode contributes to netns refcount. And it's held > for as long as the _inode_ has positive refcount - we only drop it from > the inode destructor, *NOT* every time we drop a reference to inode. > In the similar fashion, the inode reference in dentry contributes to inode > refcount. And again, that inode reference won't be dropped until the _last_ > reference to dentry gets dropped. > > Incrementing refcount of dentry is enough to pin the inode and thus the > netns the inode refers to. It's a very common pattern with refcounting; > a useful way of thinking about it is to consider the refcount of e.g. > inode as sum of several components, one of them being "number of struct > dentry instances with ->d_inode pointing to our inode". And look at e.g. > __ns_get_path() like this: > rcu_read_lock(); > d = atomic_long_read(&ns->stashed); > if (!d) > goto slow; > dentry = (struct dentry *)d; > if (!lockref_get_not_dead(&dentry->d_lockref)) > goto slow; > other_count(dentry) got incremented by 1. > rcu_read_unlock(); > ns->ops->put(ns); > other_count(ns) decremented by 1. > got_it: > path->mnt = mntget(mnt); > path->dentry = dentry; > path added to paths(dentry), other_count(dentry) decremented by 1 (getting > it back to the original value). > return 0; > slow: > rcu_read_unlock(); > inode = new_inode_pseudo(mnt->mnt_sb); > if (!inode) { > ns->ops->put(ns); > subtract 1 from other_count(ns) > return -ENOMEM; > } > dentries(inode) = empty > other_count(inode) = 1 > .... > inode->i_private = ns; > add inode to inodes(ns), subtract 1 from other_count(ns); the total > is unchanged. > dentry = d_alloc_anon(mnt->mnt_sb); > if (!dentry) { > iput(inode); > subtract 1 from other_count(inode). Since now all components of > inode refcount are zero, inode gets destroyed. Destructor calls > nsfs_evict_inode(), which removes the inode from inodes(ns). > The total effect: inode is destroyed, inodes(ns) is back to what > it used to be and other_count(ns) is left decremented by 1 compared > to what we used to have. IOW, the balance is the same as if inode > allocation would've failed. > return -ENOMEM; > } > other_count(dentry) = 1 > d_instantiate(dentry, inode); > add dentry to dentries(inode), subtract 1 from other_count(inode). > The total is unchanged. Now other_count(inode) is 0 and dentries(inode) > is {dentry}. > d = atomic_long_cmpxchg(&ns->stashed, 0, (unsigned long)dentry); > if (d) { > somebody else has gotten there first > d_delete(dentry); /* make sure ->d_prune() does nothing */ > dput(dentry); > subtract 1 from other_count(dentry) (which will drive it to 0). Since > no other references exist, dentry gets destroyed. Destructor will > remove dentry from dentries(inode) and since other_count(inode) is already > zero, trigger destruction of inode. That, in turn, will remove inode > from inodes(ns). Total effect: dentry is destroyed, inode is destroyed, > inodes(ns) is back to what it used to be, other_count(ns) is left decremented > by 1 compared to what we used to have. > cpu_relax(); > return -EAGAIN; > } > goto got_it; > got_it: > path->mnt = mntget(mnt); > path->dentry = dentry; > add path to paths(dentry), subtract 1 from other_count(dentry). At that > point other_count(dentry) is back to 0, ditto for other_count(inode) and > other_count(ns) is left decremented by 1 compared to what it used to be. > inode is added to inodes(ns), dentry - to dentries(inode) and path - to > paths(dentry). > return 0; > and we are done. > > In all cases the total effect is the same as far as "other" counts go: > other_count(ns) is down by 1 and that's the only change in other_count() > of *ANY* objects. Of course we do not keep track of the sets explicitly; > it would cost too much and we only interested in the sum of their sizes > anyway. What we actually store is the sum, so operations like "transfer > the reference from one component to another" are not immediately obvious > to be refcounting ones - the sum is unchanged. Conceptually, though, > they are refcounting operations. > Up to d_instantiate() we are holding a reference to inode; > after that we are *not* - it has been transferred to dentry. That's > why on subsequent failure exits we do not call iput() - the inode > reference is not ours to discard anymore. > In the same way, up to inode->i_private = ns; we are holding > a reference to ns. After that we are not - it's been transferred to > inode. From that point on it's not ours to discard; it will be > dropped when inode gets destroyed, whenever that happens.