RE: [RFC]Pid conversion between pid namespace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

> -----Original Message-----
> From: Serge Hallyn [mailto:serge.hallyn@xxxxxxxxxx]
> Sent: Tuesday, August 05, 2014 6:21 AM
> 
> Quoting chenhanxiao@xxxxxxxxxxxxxx (chenhanxiao@xxxxxxxxxxxxxx):
> > Hi,
> >
> > We discussed two ways of pid conversion:
> > syscall and procfs.
> >
> > Both of them could do a pid translation job.
> > But for ns hierarchy, syscall like:
> >
> > pid_t* getnspid(pid_t query_pid, pid_t observer_pid)
> > or
> > pid_t getnspid(pid_t query_pid, int query_fd, int ref_fd)
> >
> > could not work, we knew a pid lived in one ns, but we
> 
> Note I still disagree here. 
> 
> > did not know their relationships.
> > For getting the entire set of pids, both of them can do.
> >
> > So using procfs is a better way.
> >
> > Ex:
> >     init_pid_ns     ns1         ns2
> > t1  2
> > t2   `- 3           1
> > t3       `- 4       `- 5        1
> > t4           `-6        `-8      `-9
> > t5             `-10        `-9      `-10
> >
> > 1. How procfs work:
> > a) adding a nspid hierarchy  under /proc/ like:
> > [root@localhost proc]# tree /proc/nspid
> > /proc/nspid
> > ├── ns0
> > │    └── ns1
> 
> Are these actually called 'ns1' etc?  Adding a namespace of pid
> namespace names is a bad thing.

That's just an example.
We incline to name it as ns$(inum), 
like what we did in proc_ns_readlink.

> 
> > │       ├── ns2
> > │       │   └── pid -> /proc/9/ns
> > │       └── pid -> /proc/4/ns
> > └── pid -> /proc/1/ns
> >
> > We created dirs and add a link to the 1st process of this ns.
> 
> How much more kernel space does this take up?
> 

Only first process when creating new ns will be add here.
So there would not so many items.

> Is there an easy way to go from a pid in your own namespace
> to its proper node under /proc/nspid?  I.e. if I am interested
> in pid 9987, which happens to be pid 5 inside a container in
> ns2, and then I want to know what it means when it (pid 9987)
> is talking about 'pid 10'.  Is there a link under /proc/9987/
> leading to /proc/nspid/ns2/5 ?

If you want to query pid 9987, you could:
a) readlink /proc/9987/ns/pid
b) refer to /proc/nspid/ns$(inum)/ns$(inum)..
c) Also the link to the 1st new ns process could be found under ns$(inum).

Or as what you said above,
we could do some change in /proc/PID/ns/pid
a) when new ns created, we put them under /proc/nspid
b) create a link from /proc/PID/ns/pid to /proc/nspid/ns$(inum)/pid

Then we could get a more clear view:
1. pidns view
/proc/nspid
├── ns_4026531836	(ns0)
│  ├─ ns1
│  │   ├─── ns2
│  │   └── pid -> pid:[4026531836]
│  └── pid -> pid:[4026531816]
└── pid -> pid:[4026531806]

Then there will be a link under /proc/9987/ns/pid to ns2:
2. PID1 live in ns0, PID2 live in ns2
/proc/PID1/ns/pid->/proc/nspid/ns_4026531806

/proc/PID2/ns/pid->/proc/nspid/ns_4026531836

> 
> > b) expose all sets of pid, pgid, sid and tgid
> > via expanded /proc/PID/status
> >       We could get translated IDs from container like:
> >     NStgid:	6 	8	9
> >     NSpid:	6 	8 	9
> >     NSpgid:	6 	8 	9
> >     NSsid:	6 	1 	0
> >     (a set of IDs with 3 level of ns)
> 
> This sure does seem the simplest route.  But it actually still
> does not provide us an easy answer to "what does pid 9987 mean
> when it talks about pid 10?".

Do you mean:
init_pid_ns   ns1     ns2
9987            10      5
Neither getnspid syscall nor proc/PID/status expansion
could answer this without hierarchy information.
For users in init_pid_ns, getnspid needs
an observer pid live and only live in ns1,
or we should call getnspid in ns1.
See below for more.

> 
> > 2. Advantage of procfs solution
> > a) easy to use:
> > getnspid(6, 10) -> (10, 9, 10)
> > or
> > getnspid(10, ns1_fd, ns0_fd) -> 9
> > getnspid(10, ns2_fd, ns0_fd) -> 10
> >
> > And we could also get it by:
> > cat /proc/10/status | grep NSpid:
> > NSpid:	10 	9 	10
> > ...
> 
> It looks nice, but I'm not convinced it gives us the info we
> need.
> 
> It's certainly possible that I've just not thought it through
> enough.
> 
> Question: are you proposing this (/proc/pid/status expansion) as an
> alternative to /proc/nspid, or are they meant to be complementary?
> 

We want /proc/nspid as a complement for pid translation.
Ex:
    init_pid_ns     ns1         ns2
t1  2
t2   `- 3           1 
t3       `- 4       `- 5        1
t4           `-6        `-8      `-9
t5             `-10        `-9      `-10
Suppose we were in init_pid_ns:
getnspid(9,4)->6 (t4)
getnspid(9,3)->10(t5)
We knew t2 in ns1 and t3 in ns2, but we don't know their relationship.
If we want to query pid 9 in ns1, we could use getnspid(9,3)->10(t5)
but the pre-requisite is that we know ns2 is the child of ns1. 

Thanks,
-Chen

> > b) hierarchy info:
> > We could not get the ns hierarchy info by just one syscall.
> > If we had to, it will complicate the interface.
> 
> Agreed.  But I'm not sure that's particularly important.
> 
> > We could check whether two process had some relations
> > via procfs:
> > readlink /proc/PID1/ns/pid -> aaa
> > readlink /proc/PID2/ns/pid -> bbb
> >
> > Then we could check /proc/nspid/nsX/nsY/nsZ
> > and find out their relationship.
> > Ex:
> > We know t4 live in ns2,
> > readlink /proc/t4/ns/pid -> AAA
> > then we refer to /proc/nspid/ and find a same inum AAA under
> > /proc/nspid/ns0/ns1/ns2
> > Then we knew that t4 have pid 9 in ns2, have pid 8 in ns1.
> >
> > Any comments would be warmly welcomed!
> >
> > Thanks,
> > - Chen
> >
> > > -----Original Message-----
> > > From: containers-bounces@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > > [mailto:containers-bounces@xxxxxxxxxxxxxxxxxxxxxxxxxx] On Behalf Of
> > > chenhanxiao@xxxxxxxxxxxxxx
> > > Sent: Wednesday, July 09, 2014 6:34 PM
> > > To: Eric W. Biederman (ebiederm@xxxxxxxxxxxx); Serge Hallyn
> > > (serge.hallyn@xxxxxxxxxx); Oleg Nesterov (oleg@xxxxxxxxxx); Richard
> Weinberger
> > > (richard@xxxxxx); Pavel Emelyanov (xemul@xxxxxxxxxxxxx); Vasily Kulikov
> > > (segoon@xxxxxxxxxxxx); Gotou, Yasunori/五島 康文; 'Daniel P. Berrange
> > > (berrange@xxxxxxxxxx)'
> > > Cc: containers@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> > > Subject: RE: [RFC]Pid conversion between pid namespace
> > >
> > > Hi,
> > >
> > > Let me summarize our discussions of ID conversion by pros/cons:
> > >
> > > A) make new system call for translation
> > >     A-1) systemcall(ID, NS1, NS2) into (ID).
> > >     pros:
> > >         - has a reference ns(NS2)
> > >           We could get any lower level ID directly.
> > >
> > >     cons:
> > >         - lack of hierarchy information.
> > >           CRIU need hierarchy info for checkpoint/restore in nested containers.
> > >         - not easy for debug.
> > >           And a lot of tools/libs need be modified.
> > >
> > >     A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> > >     pros:
> > >         - ns procfs free, easy to use.
> > >         We could get rid of mounted ns procfs.
> > >
> > >     cons:
> > >         - may find multiple results in nested ns.
> > >           We wished the new API could tell us the exact answer.
> > >           But if getnspid return more than one results will bring trouble to
> admins,
> > >           they had to make another decision.
> > >           Or we marked the deepest level for translation as prerequisite.
> > >
> > >         -based on current pidns, no reference ns.
> > >
> > > B) make/change proc file/directories
> > > 	B-1) expand /proc/pid/status
> > > 	pros:
> > >         - easy to use and to debug
> > >         - already had existed interface in kernel
> > >
> > > 	cons:
> > >         - based on current ns
> > >           for middle level, we had to make another decision.
> > >         - do not have hierarchy info.
> > >
> > > 	B-2) /proc/<pidX>/ns/proc/ which would contain everything
> > > 	pros:
> > >         - have enough info from /proc in container
> > >
> > > 	cons:
> > >         - Requirements unclear.
> > >           We need more discussion to decide which items should not be exposed.
> > >         - do not have hierarchy info.
> > >
> > >
> > > How about do these things in two steps:
> > >
> > > C)  1. expose all sets of pid, pgid, sid and tgid
> > > via expanded /proc/PID/status
> > >       We could get translated IDs from container like:
> > >     NStgid:	16465 	5 	1
> > >     NSpid:	16465 	5 	1
> > >     NSpgid:	16465 	5 	1
> > >     NSsid:	16423 	1 	0
> > >     (a set of IDs with 3 level of ns)
> > >
> > >     2. add hierarchy info under /proc
> > >       We lacked of method of getting hierarchy info, which is useful.
> > >       Then we could know the relationship of ns.
> > >       How about adding a new proc file just under /proc
> > >       to show the hierarchy like readlink did:
> > > 	  pid:[4026531836]-> [4026532390] -> [4026532484]
> > >       pid:[4026531836]-> [4026532491]
> > >       (A 3 level pid and 2 level pid_
> > >
> > > Any comments would be appreciated.
> > >
> > > Thanks,
> > > - Chen
> > >
> > > > -----Original Message-----
> > > > Subject: [RFC]Pid conversion between pid namespace
> > > >
> > > > Hi,
> > > >
> > > > We had some discussions on how to carry out
> > > > pid conversion between pid namespace via:
> > > > syscall[1] and procfs[2].
> > > >
> > > > Pavel suggested that a syscall like
> > > > (ID, NS1, NS2) into (ID).
> > > >
> > > > Serge suggested that a syscall
> > > > pid_t getnspid(pid_t query_pid, pid_t observer_pid).
> > > >
> > > >
> > > > Eric and Richard suggested a procfs solution is
> > > > more appropriate.
> > > >
> > > > Oleg suggested that we should expand /proc/pid/status
> > > > to report this kind of information.
> > > >
> > > > And Richard suggested adding a directory like
> > > > /proc/<pidX>/ns/proc/ which would contain everything
> > > > from /proc/<pidX inside the namespace>/.
> > > >
> > > > As procfs provided a more user friendly interface,
> > > > how about expose all sets of tgid, pid, pgid, sid
> > > > by expanding /proc/PID/status in procfs?
> > > > And we could also expose ns hierarchy under /proc,
> > > > which could be another reference.
> > > >
> > > > Ex:
> > > >     init_pid_ns    ns1         ns2
> > > > t1  2
> > > > t2   `- 3          1
> > > > t3       `- 4      `- 5        1
> > > >
> > > > We could get in /proc/t3/status:
> > > > NSpid: 4 5 1
> > > > We knew that pid 1 in container is pid 4 in init ns.
> > > >
> > > > And we could get ns hierarchy under /proc/ns_hierarchy like:
> > > > init_ns->ns1->ns2		(as the result of readlink)
> > > >          ->ns3
> > > > We knew that t3 in ns2, and its hierarchy.
> > > >
> > > > How these ideas looks like?
> > > > Any comments would be appreciated.
> > > >
> > > > Thanks,
> > > > - Chen
> > > >
> > > >
> > > > a) syscall
> > > > http://lwn.net/Articles/602987/
> > > >
> > > > b) procfs
> > > > http://www.spinics.net/lists/kernel/msg1751688.html
> > > >
> > > > _______________________________________________
> > > > Containers mailing list
> > > > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > > > https://lists.linuxfoundation.org/mailman/listinfo/containers
> > > _______________________________________________
> > > Containers mailing list
> > > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > > https://lists.linuxfoundation.org/mailman/listinfo/containers
> 
> > _______________________________________________
> > Containers mailing list
> > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > https://lists.linuxfoundation.org/mailman/listinfo/containers

_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers





[Index of Archives]     [Cgroups]     [Netdev]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux