On Mon, Aug 03, 2020 at 01:03:17PM +0300, Kirill Tkhai wrote: > On 31.07.2020 01:13, Eric W. Biederman wrote: > > Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes: > > > >> On 30.07.2020 17:34, Eric W. Biederman wrote: > >>> Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes: > >>> > >>>> Currently, there is no a way to list or iterate all or subset of namespaces > >>>> in the system. Some namespaces are exposed in /proc/[pid]/ns/ directories, > >>>> but some also may be as open files, which are not attached to a process. > >>>> When a namespace open fd is sent over unix socket and then closed, it is > >>>> impossible to know whether the namespace exists or not. > >>>> > >>>> Also, even if namespace is exposed as attached to a process or as open file, > >>>> iteration over /proc/*/ns/* or /proc/*/fd/* namespaces is not fast, because > >>>> this multiplies at tasks and fds number. > >>> > >>> I am very dubious about this. > >>> > >>> I have been avoiding exactly this kind of interface because it can > >>> create rather fundamental problems with checkpoint restart. > >> > >> restart/restore :) > >> > >>> You do have some filtering and the filtering is not based on current. > >>> Which is good. > >>> > >>> A view that is relative to a user namespace might be ok. It almost > >>> certainly does better as it's own little filesystem than as an extension > >>> to proc though. > >>> > >>> The big thing we want to ensure is that if you migrate you can restore > >>> everything. I don't see how you will be able to restore these files > >>> after migration. Anything like this without having a complete > >>> checkpoint/restore story is a non-starter. > >> > >> There is no difference between files in /proc/namespaces/ directory and /proc/[pid]/ns/. > >> > >> CRIU can restore open files in /proc/[pid]/ns, the same will be with /proc/namespaces/ files. > >> As a person who worked deeply for pid_ns and user_ns support in CRIU, I don't see any > >> problem here. > > > > An obvious diffference is that you are adding the inode to the inode to > > the file name. Which means that now you really do have to preserve the > > inode numbers during process migration. > > > > Which means now we have to do all of the work to make inode number > > restoration possible. Which means now we need to have multiple > > instances of nsfs so that we can restore inode numbers. > > > > I think this is still possible but we have been delaying figuring out > > how to restore inode numbers long enough that may be actual technical > > problems making it happen. > > Yeah, this matters. But it looks like here is not a dead end. We just need > change the names the namespaces are exported to particular fs and to support > rename(). > > Before introduction a principally new filesystem type for this, can't > this be solved in current /proc? do you mean to introduce names for namespaces which users will be able to change? By default, this can be uuid. And I have a suggestion about the structure of /proc/namespaces/. Each namespace is owned by one of user namespaces. Maybe it makes sense to group namespaces by their user-namespaces? /proc/namespaces/ user mnt-X mnt-Y pid-X uts-Z user-X/ user mnt-A mnt-B user-C user-C/ user user-Y/ user Do we try to invent cgroupfs for namespaces? Thanks, Andrei