Re: strange pgbench results (as if blocked at the end)

Greg Smith <greg@xxxxxxxxxxxxxxx> · Fri, 12 Aug 2011 23:09:07 -0400

On 08/12/2011 07:37 PM, Tomas Vondra wrote:

  I've run nearly 200 of these, and in about 10 cases I got something that
looks like this:

http://www.fuzzy.cz/tmp/pgbench/tps.png
http://www.fuzzy.cz/tmp/pgbench/latency.png

i.e. it runs just fine for about 3:40 and then something goes wrong. The
bench should take 5:00 minutes, but it somehow locks, does nothing for
about 2 minutes and then  all the clients end at the same time. So instead
of 5 minutes the run actually takes about 6:40.

You need to run tests like these for 10 minutes to see the full cycle
of things; then you'll likely see them on most runs, instead of only
5%.  It's probably the case that some of your tests are finishing
before the first checkpoint does, which is why you don't see the bad
stuff every time.

The long pauses are most likely every client blocking once the
checkpoint sync runs.  When those fsync calls go out, Linux will freeze
for quite a while there on ext3.  In this example, the drop in TPS/rise
in latency at around 50:30 is either the beginning of a checkpoint or
the dirty_background_ratio threshold in Linux being exceeded; they tend
to happen around the same time.  It executes the write phase for a bit,
then gets into the sync phase around 51:40.  You can find a couple of
examples just like this on my giant test set around what was committed
as the fsync compaction feature in 9.1, all at
http://www.2ndquadrant.us/pgbench-results/index.htm

The one most similar to your case is
http://www.2ndquadrant.us/pgbench-results/481/index.html  Had that test
only run for 5 minutes, it would have looked just like yours, ending
after the long pause that's in the middle on my run.  The freeze was
over 3 minutes long in that example.  (My server has a fairly fast disk
subsystem, probably faster than what you're testing, but it also has
8GB of RAM that it can dirty to more than make up for it).

In my tests, I switched from ext3 to XFS to get better behavior.  You
got the same sort of benefit from ext4.  ext3 just doesn't handle its
write cache filling and then having fsync calls execute very well. 
I've given up on that as an unsolvable problem; improving behavior on
XFS and ext4 are the only problems worth worrying about now to me.

And I keep seeing too many data corruption issues on ext4 to recommend
anyone use it yet for PostgreSQL, that's why I focused on XFS.  ext4
still needs at least a few more months before all the bug fixes it's
gotten in later kernels are backported to the 2.6.32 versions deployed
in RHEL6 and Debian Squeeze, the newest Linux distributions my
customers care about right now.  On RHEL6 for example, go read
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/6.1_Technical_Notes/kernel.html
, specifically BZ#635199, and you tell me if that sounds like it's
considered stable code yet or not.  "The block layer will be updated in
future kernels to provide this more efficient mechanism of ensuring
ordering...these future block layer improvements will change some
kernel interfaces..."  Yikes, that does not inspire confidence to me.

-- 
Greg Smith   2ndQuadrant US    greg@xxxxxxxxxxxxxxx   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us