The kernel on our Linux system doesn't appear to
have these two settings according to the list provided by sysctl -a. Please
pardon my ignorance, but should I add them?
We have Postgresql 9.0 on Linux 2.6.18-164.el5 #1
SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
Thanks,
Midge
----- Original Message -----
Sent: Wednesday, January 02, 2013 1:46
PM
Subject: Two Necessary Kernel
Tweaks for Linux Systems
Hey everyone!
After much testing and hair-pulling, we've
confirmed two kernel settings that should always be modified in production
Linux systems. Especially new ones with the completely fair scheduler
(CFS) as opposed to the O(1) scheduler.
If you want to follow
along, these
are:
/proc/sys/kernel/sched_migration_cost /proc/sys/kernel/sched_autogroup_enabled
Which
correspond to sysctl
settings:
kernel.sched_migration_cost kernel.sched_autogroup_enabled
What
do these settings do? --------------------------
*
sched_migration_cost
The migration cost is the total time the scheduler
will consider a migrated process "cache hot" and thus less likely to be
re-migrated. By default, this is 0.5ms (500000 ns), and as the size of the
process table increases, eventually causes the scheduler to break down. On
our systems, after a smooth degradation with increasing connection count,
system CPU spiked from 20 to 70% sustained and TPS was cut by 5-10x once
we crossed some invisible connection count threshold. For us, that was a
pgbench with 900 or more clients.
The migration cost should be
increased, almost universally on server systems with many processes. This
means systems like PostgreSQL or Apache would benefit from having higher
migration costs. We've had good luck with a setting of 5ms (5000000 ns)
instead.
When the breakdown occurs, system CPU (as obtained from sar)
increases from 20% on a heavy pgbench (scale 3500 on a 72GB system) to
over 70%, and %nice/%user is cut by half or more. A higher migration cost
essentially eliminates this artificial throttle.
*
sched_autogroup_enabled
This is a relatively new patch which Linus
lauded back in late 2010. It basically groups tasks by TTY so perceived
responsiveness is improved. But on server systems, large daemons like
PostgreSQL are going to be launched from the same pseudo-TTY, and be
effectively choked out of CPU cycles in favor of less important
tasks.
The default setting is 1 (enabled) on some platforms. By setting
this to 0 (disabled), we saw an outright 30% performance boost on the same
pgbench test. A fully cached scale 3500 database on a 72GB system went
from 67k TPS to 82k TPS with 900 client connections.
Total
Benefit -------------
At higher connections counts, such as systems
that can't use pooling or make extensive use of prepared queries, these
can massively affect performance. At 900 connections, our test systems
were at 17k TPS unaltered, but 85k TPS after these two modifications. Even
with this performance boost, we still had 40% CPU free instead of 0%. In
effect, the logarithmic performance of the new scheduler is returned to
normal under large process tables.
Some systems will have a higher
"cracking" point than others. The effect is amplified when a system is
under high memory pressure, hence a lot of expensive queries on a high
number of concurrent connections is the easiest way to replicate these
results.
Admins migrating from older systems (RHEL 5.x) may find this
especially shocking, because the old O(1) scheduler was too "stupid" to
have these advanced features, hence it was impossible to cause this kind
of behavior.
There's probably still a little room for improvement here,
since 30-40% CPU is still unclaimed in our larger tests. I'd like to see
the total performance drop (175k ideal TPS at 24-connections) decreased.
But these kernel tweaks are rarely discussed anywhere, it seems. There
doesn't seem to be any consensus on how these (and other) scheduler
settings should be modified under different usage scenarios.
I just
figured I'd share, since we found this info so beneficial.
-- Shaun
Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL,
60604 312-444-8534 sthomas@xxxxxxxxxxxxxxxx
______________________________________________
See
http://www.peak6.com/email_disclaimer/
for terms and conditions related to this email
-- Sent via
pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To
make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
|