Hey guys,
This isn't a question, but a kind of summary over a ton of investigation
I've been doing since a recent "upgrade". Anyone else out there with
"big iron" might want to confirm this, but it seems pretty reproducible.
This seems to affect the latest 3.2 mainline and by extension, any
platform using it. My tests are restricted to Ubuntu 12.04, but it may
apply elsewhere.
Comparing the latest official 3.2 kernel to the latest official 3.4
kernel (both Ubuntu), there are some rather striking differences. I'll
start with some pgbench tests.
* This test is 800 read-only clients, with 2 controlling threads on a
55GB database (scaling factor of 3600) for 3 minutes.
* With 3.4:
* Max TPS was 68933.
* CPU was between 50 and 55% idle.
* Load average was between 10 and 15.
* With 3.2:
* Max TPS was 17583. A total loss of 75% performance.
* CPU was between 12 and 25% idle.
* Load average was between 10 and 60---effectively random.
* Next, we checked minimal write tests. This time, with only two
clients. All other metrics are the same.
* With 3.4:
* Max TPS was 4548.
* CPU was between 88 and 92% idle.
* Load average was between 1.7 and 2.5.
* With 3.2:
* Max TPS was 4639.
* CPU was between 88 and 92% idle.
* Load average was between 3 and 4.
Overall, performance was _much_ worse in 3.2 by almost every metric
except for very low contention activity. More CPU for less transactions,
and wildly inaccurate load reporting. The 3.2 kernel in its current
state should be considered detrimental and potentially malicious under
high task contention.
I'll admit not letting the tests run for more than 10 iterations, but I
didn't really need more than that. Even one iteration is enough to see
this in action. At least every Ubuntu 3.2 kernel since 3.2.0-31 exhibits
this, but I haven't tested further back. I've also examined both
official Ubuntu 3.2 and Ubuntu mainline kernels as obtained from here:
http://kernel.ubuntu.com/~kernel-ppa/mainline
The 3.2.34 mainline also has these problems. For reference, I tested the
3.4.20 Quantal release on Precise because the Precise 3.4 kernel hasn't
been maintained.
Again, anyone running 12.04 LTS, take a good hard look at your systems.
Hopefully you have a spare machine to test with. I'm frankly appalled
this thing is in an LTS release.
I'll also note that all kernels exhibit some extent of client threads
bloating load reports. In a pgbench for-loop (run, sleep 1, repeat),
sometimes load will jump to some very high number between iterations,
but on a 3.4, it will settle down again. On a 3.2, it just jumps
randomly. I tested that with this script:
nLoop=0
while [ 1 -eq 1 ]; do
if [ $[$nLoop % 20] -eq 0 ]; then
echo -e "Stat Time\t\tSleep\tRun\tLoad Avg"
fi
stattime=$(date +"%Y-%m-%d %H:%M:%S")
sleep=$(ps -emo stat | egrep -c 'D')
run=$(ps -emo stat | egrep -c 'R')
loadavg=$(cat /proc/loadavg | cut -d ' ' -f 1)
echo -e "${stattime}\t${sleep}\t${run}\t${loadavg}"
sleep 1
nLoop=$[$nLoop + 1]
done
The jumps look like this:
Stat Time Sleep Run Load Avg
2012-12-05 12:23:13 0 16 7.66
2012-12-05 12:23:14 0 12 7.66
2012-12-05 12:23:15 0 7 7.66
2012-12-05 12:23:16 0 17 7.66
2012-12-05 12:23:17 0 1 24.51
2012-12-05 12:23:18 0 2 24.51
It's much harder to trigger on 3.4, but still happens.
If anyone has tested against 3.6 or 3.7, I'd love to hear your input.
Inconsistent load reports are one thing... strangled performance and
inflated CPU usage are quite another.
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@xxxxxxxxxxxxxxxx
100
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance