Re: [PATCH v4] pidns: introduce syscall translate_pid

"prakash.sangappa" <prakash.sangappa@xxxxxxxxxx> · Mon, 16 Oct 2017 15:54:24 -0700

On 10/16/2017 03:07 PM, Nagarathnam Muthusamy wrote:

On 10/16/2017 02:36 PM, Andrew Morton wrote:
On Sat, 14 Oct 2017 11:17:47 +0300 Konstantin Khlebnikov 
<khlebnikov@xxxxxxxxxxxxxx> wrote:

pid_t translate_pid(pid_t pid, int source, int target);

This syscall converts pid from source pid-ns into pid in target 
pid-ns.
If pid is unreachable from target pid-ns it returns zero.

Pid-namespaces are referred file descriptors opened to proc files
/proc/[pid]/ns/pid or /proc/[pid]/ns/pid_for_children. Negative 
argument
refers to current pid namespace, same as file /proc/self/ns/pid.

Kernel expose virtual pids in /proc/[pid]/status:NSpid, but backward
translation requires scanning all tasks. Also pids could be 
translated
by sending them through unix socket between namespaces, this 
method is
slow and insecure because other side is exposed inside pid 
namespace.
Andrew asked why we might need this.

Such conversion is required for interaction between processes across 
pid-namespaces.
For example to identify process in container by pid file looking 
from outside.

Two years ago I've solved this in project of mine with monstrous 
code which
forks couple times just to convert pid, lucky for me performance 
wasn't important.
That's a single user who needed this a single time, and found a
userspace-based solution anyway.  This is not exactly compelling!

Is there a stronger case to be made?  How does this change benefit our
users?  Sell it to us!
Oracle database is planning to use pid namespace for sandboxing 
database instances and they need an API similar to translate_pid to 
effectively translate process IDs from other pid namespaces. Prakash 
(cced in mail) can provide more details on this usecase.

As Nagarathnam indicated, Oracle Database will be using pid namespaces 
and needs a direct method of converting pids of processes in the pid 
namespace hierarchy. In this use case multiple
nested PID namespaces will be used.  The currently available mechanism 
are not very efficient for this use case. For ex. as Konstantin 
described, using /proc/<pid>/status would require the application to 
scan all the pid's status files to determine the pid of given process in 
a child namespace.

Use of SCM_CREDENTIALS's socket message is another way, which would 
require every process starting inside a pid namespace to send this 
message and the receiving process in the target namespace would have to 
save the converted pid and reference it. This mechanism becomes 
cumbersome especially if the application has to deal with multiple 
nested pid namespaces. Also, the Database needs to be able to convert a 
thread's global pid(gettid()). Passing the thread's pid(gettid()) in 
SCM_CREDENTIALS message requires CAP_SYS_ADMIN, which is an issue.

So having a direct method, like the API that Konstantin is proposing, 
will work best for the Database
since pid of a process in any of the nested pid namespaces can be 
converted as and when required. I think with the proposed API, the 
application should be able to convert pid of a process or tid(gettid()) 
of a thread as well.

-Prakash

Thanks,
Nagarathnam.

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html