Re: [PATCH] proc.5: document /proc/[pid]/task/[tid]/children

Jann Horn <jann@xxxxxxxxx> · Mon, 15 Aug 2016 00:45:46 +0200

On Mon, Aug 15, 2016 at 01:13:59AM +0300, Cyrill Gorcunov wrote:
> On Sun, Aug 14, 2016 at 10:46:35PM +0200, Jann Horn wrote:
> > On Sun, Aug 14, 2016 at 11:14:41PM +0300, Cyrill Gorcunov wrote:
> > > On Sun, Aug 14, 2016 at 12:48:56PM +0200, Jann Horn wrote:
> > > > > 
> > > > > Hi! First of all, sorry for delay. Guys, this is not really true. The same
> > > > > applies to plain "ls /proc".
> > > > 
> > > > It does not. /proc is wobbly in a running system, /proc/$pid/children is
> > > > completely unreliable.
> > > 
> > > Nope -- look into how pids are instantinated: once pids are read new ones
> > > may appear which you won't notice without re-read. You still may miss freshly
> > > created pids.
> > 
> > That's pretty much inherent when you're inspecting a moving system - by the
> > time you've collected your information, it might be stale. So what?
> 
> What "what"? I told you that the information you fetch from running system
> only valid when kernel does its work, once you jump back to userspace
> the information might not be valid anymore. And the @children does the
> same: in native procfs read you may miss freshly created processes,
> in @children read you may miss exited processes. It's the same nature.

"in @children read you may miss exited processes"? If that was the extent of it,
everything would be fine. But that's not what's happening. Read the comment in
get_children_pid():

	 * We might miss some children here if children
	 * are exited while we were not holding the lock,
	 * but it was never promised to be accurate that
	 * much.
	 *
	 * "Just suppose that the parent sleeps, but N children
	 *  exit after we printed their tids. Now the slow paths
	 *  skips N extra children, we miss N tasks." (c)

"skips N extra children". *NOT* necessarily the same N children that exited, but
also children that were running before you started reading the "children" file
and are still running afterwards.

That's the big difference between the interfaces: Normal procfs reads might not
return new or outdated information, which is mostly inherent when you're
inspecting a running system, but the "children" interface can also drop
information about completely stable task relationships.

> > > > I think these should work for obtaining a sufficiently consistent view
> > > > of the process structure of a running system.
> > > > 
> > > > But yeah, safely using this interface isn't easy, and more
> > > > inode-centered APIs for interaction with processes would be nice to
> > > > have. (E.g. an entry in /proc/$pid that points to the parent inode,
> > > > maybe a directory containing entries that point to the child inodes,
> > > > and process directory entries offering functionality equivalent to
> > > > syscalls like kill(), sched_setscheduler() and prlimit().)
> > > 
> > > Well, all this really waste a huge amount of time, that's why we needed
> > > $children. In general more preferred way might be task-diag interface
> > > which Andrew implemented (I'm not sure which exactly state of the
> > > series at the moment, have it been merged or not https://lkml.org/lkml/2016/4/11/932)
> > 
> > Yuck. Everything is PID-based? That's ugly.
> 
> That happened the process are pid based things.

PID-based interfaces suck unless you're the ptracer or reaper of all the
tasks you're inspecting, and an interface based on less reusable handles
(like procfs directory file descriptors or unique 64-bit identifiers or
whatever) would be much safer.

Yes, I know that all those traditional APIs use PIDs, but that doesn't
change that those interfaces suck. When you kill -9 a daemon that doesn't
quit when asked to quit, for example, there's a chance that the daemon
actually does quit and its PID is reallocated to some vital system
service just before you call kill() - and then your system breaks in
some unpleasant way.
Attachment:
signature.asc

Description: Digital signature