Re: new execve/kernel_thread design

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Sorry; forgot about that typo in Cc...  Repost to linux-arch alone]

On Tue, Oct 16, 2012 at 11:35:08PM +0100, Al Viro wrote:
> 	1.  Basic rules for process lifetime.
> Except for the initial process (init_task, eventual idle thread on the boot
> CPU) all processes are created by do_fork().  There are three classes of
> those: kernel threads, userland processes and idle threads to be.  There are
> few low-level operations involved:
> 	* a kernel thread can spawn a new kernel thread; the primitive
> doing that is kernel_thread().
> 	* a userland process can spawn a new userland process; that's
> done by sys_fork()/sys_vfork()/sys_clone()/sys_clone2().
> 	* a kernel thread can become a userland process.  The primitive
> is kernel_execve().
> 	* a kernel thread can spawn a future idle thread; that's done
> by fork_idle().  Result is *not* scheduled until the secondary CPU gets
> initialized and its state is heavily overwritten in process.

Minor correction: while the first two cases go through do_fork() to
copy_process() to copy_thread(), fork_idle() calls copy_process() directly.

> 	4. What is done?
> I've done the conversions for almost all architectures, but quite a few
> are completely untested.
> 
> I'm fairly sure about alpha, x86 and um.  Tested and I understand the
> architecture well enough.  arm, mips and c6x had been tested by architecture
> maintainers.  This stuff also works.  alpha, arm, x86 and um are fully
> converted in mainline by now.

arm64 fixed and tested by maintainer, put in no-rebase mode.

sparc corrected to avoid branching beyond what ba,pt allows, ACKed by Davem
in that form.  In no-rebase mode.

m68k tested and ACKed on coldfire; I think that along with aranym testing
here that is enough.  In no-rebase mode.

Surprisingly enough, ia64 one seems to work on actual hardware; I have sent
Tony an incremental patch cleaning copy_thread() up, waiting for results of
testing that on SMP box.

Even more surprisingly, unicore32 variant turned out to contain only one
obvious typo.  Fixed and tested by maintainer of unicore32 tree and actually
applied there, I've pulled his branch at that point.

microblaze: some fixes from Michal folded, still breakage with kernel_execve()
side of things.

Since there had been no signs of life from hexagon folks, I'd done (absolutely
blind and untested) tentative patches; see #arch-hexagon.  Same situation
as with most of the embedded architectures - i.e. take with a cartload of salt,
that pair of patches is intended to be a possible starting point for producing
something working.

At that point we have the following situation:
alpha                   done
arm                     done
arm64                   done
avr32                   untested
blackfin                untested
c6x                     done
cris                    untested
frv                     untested, maintainer going to test
h8300                   untested
hexagon                 untested
ia64                    apparently works, needs the final ACK from Tony.
m32r                    untested
m68k                    done
microblaze              partially tested, maintainer hunting breakage down
mips                    done
mn10300                 untested
openrisc                maintainers said to have partially working variant
parisc                  should work, needs testing and ACK
powerpc                 should work, needs testing and ACK
s390                    should work, needs testing and ACK
score                   untested
sh                      untested, maintainers planned reviewing and testing
sparc                   done
tile                    maintainers writing that one
um                      done
unicore32               done
x86                     done
xtensa                  maintainers writing that one

One more thing: AFAICS, just about everything has something along the lines
of
	if (!usp)
		usp = <current userland sp>
	do_fork(flags, usp, ....)
in their sys_clone().  How about taking that into copy_thread()?  After
all, the logics there is
	copy all the state, including userland stack pointer to child
	override userland stack pointer with what the caller passed to
copy_thread()
often enough with "... and if we are about to override it with something
different, do the following extra work".  Turning that into
	copy all the state, including userland stack pointer to child
	if (usp) {
		override the userland stack pointer for child and maybe do
		some extra work
	}
would seem to be a fairly natural thing.  Does anybody see problems with
doing that on their architecture?  Note that with that fork() becomes
simply
#ifndef CONFIG_MMU
	return -EINVAL;
#else
	return do_fork(SIGCHLD, 0, current_pt_regs(), 0, NULL, NULL);
#endif
and similar for vfork().  And these can definitely drop the Cthulhu-awful
kludges for obtaining pt_regs (OK, on everything that doesn't do
kernel_thread() via syscall-from-kernel, but by now only xtensa is still
doing that).  In some cases we need to do a bit of work before that
(gather callee-saved registers so that the child could get them as on alpha,
mips, m68k, openrisc, parisc, ppc and x86, flush userland register windows
on sparc and get psr/wim values on sparc32), but a lot more architectures
lose the asm wrappers for those and the rest can get rid of assorted
ugliness involved in getting that struct pt_regs *.

BTW, alpha seems to be doing an absolutely pointless work on the way out of
sys_fork() et.al. - saving callee-saved registers is needed, all right,
but why bother restoring all of them on the way out in the parent?  All
we need is rp; that's ~0.3Kb of useless reads from memory on each fork()...

The same goes for m68k; there the amount of traffic is less, but still, what
the hell for?  Child needs callee-saved registers restored (and usually will
have that done by switch_to()), but the parent needs only to make sure they
are saved and available for copy_thread() to bring them to child (incidentally,
copying registers is needed only when they are not embedded into task_struct.
At least um is doing a memcpy() for no reason whatsoever; fix will be sent
to rw shortly and ISTR seeing something similar on some of the other
architectures).

Another cross-architecture thing: folks, watch out for what's being done with
thread flags; I've just found a lovely bug on alpha where we have prctl(2)
doing non-atomic modifications of those (as in ti->flags = (ti->flags&~x)|y;),
which is obviously broken; TIF_SIGPENDING can be set asynchronously and even
from an interrupt.  Fix for this one is going to Linus shortly (adding
a separate field for thread-synchronous flags, taking obviously t-s ones
there, including the UAC_... bunch set by that prctl()), but I don't think
that I can audit that for all architectures efficiently; cursory look has
found a braino on frv (fix being discussed with dhowells), but there may bloody
well be more of that fun.
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux