Re: [PATCH v4] pidns: introduce syscall translate_pid

nagarathnam muthusamy <nagarathnam.muthusamy@xxxxxxxxxx> · Wed, 01 Nov 2017 09:59:55 -0700

I believe all the questions raised in this thread were answered. Just 
wondering if there are any outstanding questions?

Thanks,
Nagarathnam.
On 10/17/2017 3:53 PM, prakash sangappa wrote:

On 10/17/2017 3:40 PM, Andy Lutomirski wrote:
On Tue, Oct 17, 2017 at 3:35 PM, prakash sangappa
<prakash.sangappa@xxxxxxxxxx> wrote:
On 10/17/2017 3:02 PM, Andy Lutomirski wrote:
On Tue, Oct 17, 2017 at 8:38 AM, Prakash Sangappa
<prakash.sangappa@xxxxxxxxxx> wrote:

On 10/16/17 5:52 PM, Andy Lutomirski wrote:
On Mon, Oct 16, 2017 at 3:54 PM, prakash.sangappa
<prakash.sangappa@xxxxxxxxxx> wrote:

On 10/16/2017 03:07 PM, Nagarathnam Muthusamy wrote:

On 10/16/2017 02:36 PM, Andrew Morton wrote:
On Sat, 14 Oct 2017 11:17:47 +0300 Konstantin Khlebnikov
<khlebnikov@xxxxxxxxxxxxxx> wrote:

pid_t translate_pid(pid_t pid, int source, int target);

This syscall converts pid from source pid-ns into pid in 
target
pid-ns.
If pid is unreachable from target pid-ns it returns zero.

Pid-namespaces are referred file descriptors opened to 
proc files
/proc/[pid]/ns/pid or /proc/[pid]/ns/pid_for_children. 
Negative
argument
refers to current pid namespace, same as file 
/proc/self/ns/pid.

Kernel expose virtual pids in /proc/[pid]/status:NSpid, but
backward
translation requires scanning all tasks. Also pids could be
translated
by sending them through unix socket between namespaces, this
method
is
slow and insecure because other side is exposed inside pid
namespace.
Andrew asked why we might need this.

Such conversion is required for interaction between processes 
across
pid-namespaces.
For example to identify process in container by pid file looking
from
outside.

Two years ago I've solved this in project of mine with monstrous
code
which
forks couple times just to convert pid, lucky for me performance
wasn't
important.
That's a single user who needed this a single time, and found a
userspace-based solution anyway.  This is not exactly compelling!

Is there a stronger case to be made?  How does this change 
benefit
our
users?  Sell it to us!
Oracle database is planning to use pid namespace for sandboxing
database
instances and they need an API similar to translate_pid to 
effectively
translate process IDs from other pid namespaces. Prakash (cced in
mail)
can
provide more details on this usecase.

As Nagarathnam indicated, Oracle Database will be using pid 
namespaces
and
needs a direct method of converting pids of processes in the pid
namespace
hierarchy. In this use case multiple
nested PID namespaces will be used.  The currently available 
mechanism
are
not very efficient for this use case. For ex. as Konstantin 
described,
using
/proc/<pid>/status would require the application to scan all the 
pid's
status files to determine the pid of given process in a child
namespace.

Use of SCM_CREDENTIALS's socket message is another way, which would
require
every process starting inside a pid namespace to send this 
message and
the
receiving process in the target namespace would have to save the
converted
pid and reference it. This mechanism becomes cumbersome 
especially if
the
application has to deal with multiple nested pid namespaces. 
Also, the
Database needs to be able to convert a thread's global 
pid(gettid()).
Passing the thread's pid(gettid()) in SCM_CREDENTIALS message 
requires
CAP_SYS_ADMIN, which is an issue.

So having a direct method, like the API that Konstantin is 
proposing,
will
work best for the Database
since pid of a process in any of the nested pid namespaces can be
converted
as and when required. I think with the proposed API, the 
application
should
be able to convert pid of a process or tid(gettid()) of a thread as
well.

Can you explain what Oracle's database is planning to do with this
information?

Database uses the PID to programmatically find out if the 
process/thread
is
alive(kill 0) also send signals to the processes requesting it to 
dump
status/debug information and kill the processes in case of a shutdown
abort
of the instance.
What I'm wondering is: how does the caller of kill() end up
controlling a task whose pid it doesn't know in its own namespace?

I was generally describing how DB would use the PID of process. The 
above
description
was in the case when no namespaces are used.

With use of namespaces, the DB would convert the PID of processes 
inside
its children namespaces to PID in its namespace and use that pid to 
issue
kill().
Seems vaguely sensible.

If I were designing this type of system, I'd have a manager process in
each namespace running as PID 1, though -- PID 1 is special and needs
to understand what's going on anyway.  Then PID 1 would do the kill()
calls and wouldn't need translate_pid().

Yes, this has been tried out with the prototype use of PID namespaces 
in the DB.
It works, but would be slow as the manager would have to exchange 
messages with the
controlling processes which would be in the parent namespace.
DB could use the api to convert the pid.

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html