Hi, > -----Original Message----- > From: Serge Hallyn [mailto:serge.hallyn@xxxxxxxxxx] > Sent: Tuesday, August 05, 2014 6:21 AM > > Quoting chenhanxiao@xxxxxxxxxxxxxx (chenhanxiao@xxxxxxxxxxxxxx): > > Hi, > > > > We discussed two ways of pid conversion: > > syscall and procfs. > > > > Both of them could do a pid translation job. > > But for ns hierarchy, syscall like: > > > > pid_t* getnspid(pid_t query_pid, pid_t observer_pid) > > or > > pid_t getnspid(pid_t query_pid, int query_fd, int ref_fd) > > > > could not work, we knew a pid lived in one ns, but we > > Note I still disagree here. > > > did not know their relationships. > > For getting the entire set of pids, both of them can do. > > > > So using procfs is a better way. > > > > Ex: > > init_pid_ns ns1 ns2 > > t1 2 > > t2 `- 3 1 > > t3 `- 4 `- 5 1 > > t4 `-6 `-8 `-9 > > t5 `-10 `-9 `-10 > > > > 1. How procfs work: > > a) adding a nspid hierarchy under /proc/ like: > > [root@localhost proc]# tree /proc/nspid > > /proc/nspid > > ├── ns0 > > │ └── ns1 > > Are these actually called 'ns1' etc? Adding a namespace of pid > namespace names is a bad thing. That's just an example. We incline to name it as ns$(inum), like what we did in proc_ns_readlink. > > > │ ├── ns2 > > │ │ └── pid -> /proc/9/ns > > │ └── pid -> /proc/4/ns > > └── pid -> /proc/1/ns > > > > We created dirs and add a link to the 1st process of this ns. > > How much more kernel space does this take up? > Only first process when creating new ns will be add here. So there would not so many items. > Is there an easy way to go from a pid in your own namespace > to its proper node under /proc/nspid? I.e. if I am interested > in pid 9987, which happens to be pid 5 inside a container in > ns2, and then I want to know what it means when it (pid 9987) > is talking about 'pid 10'. Is there a link under /proc/9987/ > leading to /proc/nspid/ns2/5 ? If you want to query pid 9987, you could: a) readlink /proc/9987/ns/pid b) refer to /proc/nspid/ns$(inum)/ns$(inum).. c) Also the link to the 1st new ns process could be found under ns$(inum). Or as what you said above, we could do some change in /proc/PID/ns/pid a) when new ns created, we put them under /proc/nspid b) create a link from /proc/PID/ns/pid to /proc/nspid/ns$(inum)/pid Then we could get a more clear view: 1. pidns view /proc/nspid ├── ns_4026531836 (ns0) │ ├─ ns1 │ │ ├─── ns2 │ │ └── pid -> pid:[4026531836] │ └── pid -> pid:[4026531816] └── pid -> pid:[4026531806] Then there will be a link under /proc/9987/ns/pid to ns2: 2. PID1 live in ns0, PID2 live in ns2 /proc/PID1/ns/pid->/proc/nspid/ns_4026531806 /proc/PID2/ns/pid->/proc/nspid/ns_4026531836 > > > b) expose all sets of pid, pgid, sid and tgid > > via expanded /proc/PID/status > > We could get translated IDs from container like: > > NStgid: 6 8 9 > > NSpid: 6 8 9 > > NSpgid: 6 8 9 > > NSsid: 6 1 0 > > (a set of IDs with 3 level of ns) > > This sure does seem the simplest route. But it actually still > does not provide us an easy answer to "what does pid 9987 mean > when it talks about pid 10?". Do you mean: init_pid_ns ns1 ns2 9987 10 5 Neither getnspid syscall nor proc/PID/status expansion could answer this without hierarchy information. For users in init_pid_ns, getnspid needs an observer pid live and only live in ns1, or we should call getnspid in ns1. See below for more. > > > 2. Advantage of procfs solution > > a) easy to use: > > getnspid(6, 10) -> (10, 9, 10) > > or > > getnspid(10, ns1_fd, ns0_fd) -> 9 > > getnspid(10, ns2_fd, ns0_fd) -> 10 > > > > And we could also get it by: > > cat /proc/10/status | grep NSpid: > > NSpid: 10 9 10 > > ... > > It looks nice, but I'm not convinced it gives us the info we > need. > > It's certainly possible that I've just not thought it through > enough. > > Question: are you proposing this (/proc/pid/status expansion) as an > alternative to /proc/nspid, or are they meant to be complementary? > We want /proc/nspid as a complement for pid translation. Ex: init_pid_ns ns1 ns2 t1 2 t2 `- 3 1 t3 `- 4 `- 5 1 t4 `-6 `-8 `-9 t5 `-10 `-9 `-10 Suppose we were in init_pid_ns: getnspid(9,4)->6 (t4) getnspid(9,3)->10(t5) We knew t2 in ns1 and t3 in ns2, but we don't know their relationship. If we want to query pid 9 in ns1, we could use getnspid(9,3)->10(t5) but the pre-requisite is that we know ns2 is the child of ns1. Thanks, -Chen > > b) hierarchy info: > > We could not get the ns hierarchy info by just one syscall. > > If we had to, it will complicate the interface. > > Agreed. But I'm not sure that's particularly important. > > > We could check whether two process had some relations > > via procfs: > > readlink /proc/PID1/ns/pid -> aaa > > readlink /proc/PID2/ns/pid -> bbb > > > > Then we could check /proc/nspid/nsX/nsY/nsZ > > and find out their relationship. > > Ex: > > We know t4 live in ns2, > > readlink /proc/t4/ns/pid -> AAA > > then we refer to /proc/nspid/ and find a same inum AAA under > > /proc/nspid/ns0/ns1/ns2 > > Then we knew that t4 have pid 9 in ns2, have pid 8 in ns1. > > > > Any comments would be warmly welcomed! > > > > Thanks, > > - Chen > > > > > -----Original Message----- > > > From: containers-bounces@xxxxxxxxxxxxxxxxxxxxxxxxxx > > > [mailto:containers-bounces@xxxxxxxxxxxxxxxxxxxxxxxxxx] On Behalf Of > > > chenhanxiao@xxxxxxxxxxxxxx > > > Sent: Wednesday, July 09, 2014 6:34 PM > > > To: Eric W. Biederman (ebiederm@xxxxxxxxxxxx); Serge Hallyn > > > (serge.hallyn@xxxxxxxxxx); Oleg Nesterov (oleg@xxxxxxxxxx); Richard > Weinberger > > > (richard@xxxxxx); Pavel Emelyanov (xemul@xxxxxxxxxxxxx); Vasily Kulikov > > > (segoon@xxxxxxxxxxxx); Gotou, Yasunori/五島 康文; 'Daniel P. Berrange > > > (berrange@xxxxxxxxxx)' > > > Cc: containers@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > > > Subject: RE: [RFC]Pid conversion between pid namespace > > > > > > Hi, > > > > > > Let me summarize our discussions of ID conversion by pros/cons: > > > > > > A) make new system call for translation > > > A-1) systemcall(ID, NS1, NS2) into (ID). > > > pros: > > > - has a reference ns(NS2) > > > We could get any lower level ID directly. > > > > > > cons: > > > - lack of hierarchy information. > > > CRIU need hierarchy info for checkpoint/restore in nested containers. > > > - not easy for debug. > > > And a lot of tools/libs need be modified. > > > > > > A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid) > > > pros: > > > - ns procfs free, easy to use. > > > We could get rid of mounted ns procfs. > > > > > > cons: > > > - may find multiple results in nested ns. > > > We wished the new API could tell us the exact answer. > > > But if getnspid return more than one results will bring trouble to > admins, > > > they had to make another decision. > > > Or we marked the deepest level for translation as prerequisite. > > > > > > -based on current pidns, no reference ns. > > > > > > B) make/change proc file/directories > > > B-1) expand /proc/pid/status > > > pros: > > > - easy to use and to debug > > > - already had existed interface in kernel > > > > > > cons: > > > - based on current ns > > > for middle level, we had to make another decision. > > > - do not have hierarchy info. > > > > > > B-2) /proc/<pidX>/ns/proc/ which would contain everything > > > pros: > > > - have enough info from /proc in container > > > > > > cons: > > > - Requirements unclear. > > > We need more discussion to decide which items should not be exposed. > > > - do not have hierarchy info. > > > > > > > > > How about do these things in two steps: > > > > > > C) 1. expose all sets of pid, pgid, sid and tgid > > > via expanded /proc/PID/status > > > We could get translated IDs from container like: > > > NStgid: 16465 5 1 > > > NSpid: 16465 5 1 > > > NSpgid: 16465 5 1 > > > NSsid: 16423 1 0 > > > (a set of IDs with 3 level of ns) > > > > > > 2. add hierarchy info under /proc > > > We lacked of method of getting hierarchy info, which is useful. > > > Then we could know the relationship of ns. > > > How about adding a new proc file just under /proc > > > to show the hierarchy like readlink did: > > > pid:[4026531836]-> [4026532390] -> [4026532484] > > > pid:[4026531836]-> [4026532491] > > > (A 3 level pid and 2 level pid_ > > > > > > Any comments would be appreciated. > > > > > > Thanks, > > > - Chen > > > > > > > -----Original Message----- > > > > Subject: [RFC]Pid conversion between pid namespace > > > > > > > > Hi, > > > > > > > > We had some discussions on how to carry out > > > > pid conversion between pid namespace via: > > > > syscall[1] and procfs[2]. > > > > > > > > Pavel suggested that a syscall like > > > > (ID, NS1, NS2) into (ID). > > > > > > > > Serge suggested that a syscall > > > > pid_t getnspid(pid_t query_pid, pid_t observer_pid). > > > > > > > > > > > > Eric and Richard suggested a procfs solution is > > > > more appropriate. > > > > > > > > Oleg suggested that we should expand /proc/pid/status > > > > to report this kind of information. > > > > > > > > And Richard suggested adding a directory like > > > > /proc/<pidX>/ns/proc/ which would contain everything > > > > from /proc/<pidX inside the namespace>/. > > > > > > > > As procfs provided a more user friendly interface, > > > > how about expose all sets of tgid, pid, pgid, sid > > > > by expanding /proc/PID/status in procfs? > > > > And we could also expose ns hierarchy under /proc, > > > > which could be another reference. > > > > > > > > Ex: > > > > init_pid_ns ns1 ns2 > > > > t1 2 > > > > t2 `- 3 1 > > > > t3 `- 4 `- 5 1 > > > > > > > > We could get in /proc/t3/status: > > > > NSpid: 4 5 1 > > > > We knew that pid 1 in container is pid 4 in init ns. > > > > > > > > And we could get ns hierarchy under /proc/ns_hierarchy like: > > > > init_ns->ns1->ns2 (as the result of readlink) > > > > ->ns3 > > > > We knew that t3 in ns2, and its hierarchy. > > > > > > > > How these ideas looks like? > > > > Any comments would be appreciated. > > > > > > > > Thanks, > > > > - Chen > > > > > > > > > > > > a) syscall > > > > http://lwn.net/Articles/602987/ > > > > > > > > b) procfs > > > > http://www.spinics.net/lists/kernel/msg1751688.html > > > > > > > > _______________________________________________ > > > > Containers mailing list > > > > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx > > > > https://lists.linuxfoundation.org/mailman/listinfo/containers > > > _______________________________________________ > > > Containers mailing list > > > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx > > > https://lists.linuxfoundation.org/mailman/listinfo/containers > > > _______________________________________________ > > Containers mailing list > > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx > > https://lists.linuxfoundation.org/mailman/listinfo/containers _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers