David VomLehn wrote:
On Tue, Apr 21, 2009 at 02:30:55PM -0700, David Daney wrote:
This patch set (against 2.6.29.1) creates a vdso and moves the signal
trampolines to it from their previous home on the stack.
Tested with a 64-bit kernel on a Cavium Octeon cn3860 where I have the
following results from lmbench2:
Before:
n64 - Signal handler overhead: 14.517 microseconds
n32 - Signal handler overhead: 14.497 microseconds
o32 - Signal handler overhead: 16.637 microseconds
After:
n64 - Signal handler overhead: 7.935 microseconds
n32 - Signal handler overhead: 7.334 microseconds
o32 - Signal handler overhead: 8.628 microseconds
Nice numbers, and something that will be even more critical as real-time
features are added and used!
non-SMP systems will probably not see so much improvement.
Although the numbers are nice, they are not the primary motivation
behind the patch. The real gains are in not having to interrupt all
cores to invalidate their caches, and the possibility of eXecute Inhibit
on the stack.
Comments encourged.
Only one comment, which I would not want to hold up acceptance:
based on some numbers sent out recently, it looks like the kernel is
experiencing some performance issues with exec() and I think this change will
make it slightly slower. You could avoid this by deferring installation of
the trampoline to the first use of a system call that registers a signal
handler.
I should try to measure this too. Although this is what x86 et al. do.
It is by far much simpler and less prone to bugs that trying to hook
into the system calls. After an executable has had the chance to start
additional threads and establish arbitrary mappings things get complicated.
David Daney