Re: strange pgbench results (as if blocked at the end)

<tv@xxxxxxxx> · Sun, 14 Aug 2011 14:51:37 +0200

On 13 Srpen 2011, 5:09, Greg Smith wrote:
> The long pauses are most likely every client blocking once the
> checkpoint sync runs.  When those fsync calls go out, Linux will freeze
> for quite a while there on ext3.  In this example, the drop in TPS/rise
> in latency at around 50:30 is either the beginning of a checkpoint or
> the dirty_background_ratio threshold in Linux being exceeded; they tend
> to happen around the same time.  It executes the write phase for a bit,
> then gets into the sync phase around 51:40.  You can find a couple of
> examples just like this on my giant test set around what was committed
> as the fsync compaction feature in 9.1, all at
> http://www.2ndquadrant.us/pgbench-results/index.htm
> 
> The one most similar to your case is
> http://www.2ndquadrant.us/pgbench-results/481/index.html  Had that test
> only run for 5 minutes, it would have looked just like yours, ending
> after the long pause that's in the middle on my run.  The freeze was
> over 3 minutes long in that example.  (My server has a fairly fast disk
> subsystem, probably faster than what you're testing, but it also has 8GB
> of RAM that it can dirty to more than make up for it).

I guess you're right - I was thinking about checkpoints too, but what
really puzzled me was that only some of the runs (with about the same
workload) were affected by that.

It's probably a timing issue - the tests were running for 5 minutes and
checkpoint timeout is 5 minutes too. So the runs where the checkpoint
timed out early had to write very little data, but when the checkpoint
timed out just before the end it had to write much more data.

I've increased the test duration to 10 minutes, decreased the
checkpoint timeout to 4 minutes and a checkpoint is issued just before
the pgbench. That way the starting position should be more or less the
same for all runs.

> In my tests, I switched from ext3 to XFS to get better behavior.  You
> got the same sort of benefit from ext4.  ext3 just doesn't handle its
> write cache filling and then having fsync calls execute very well.  I've
> given up on that as an unsolvable problem; improving behavior on XFS and
> ext4 are the only problems worth worrying about now to me.

For production systems, XFS seems like a good choice. The purpose of
the tests I've run was merely to see what is the effect of various block
size  and mount options for available file systems (including
experimental ones).

If interested, you can see the results here http://www.fuzzy.cz/bench/
although it's the first run with runs not long enough (just 5 minutes)
and some other slightly misconfigured options (shared buffers and
checkpoints).

> And I keep seeing too many data corruption issues on ext4 to recommend
> anyone use it yet for PostgreSQL, that's why I focused on XFS.  ext4
> still needs at least a few more months before all the bug fixes it's
> gotten in later kernels are backported to the 2.6.32 versions deployed
> in RHEL6 and Debian Squeeze, the newest Linux distributions my customers
> care about right now.  On RHEL6 for example, go read
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/6.1_Technical_Notes/kernel.html
> , specifically BZ#635199, and you tell me if that sounds like it's
> considered stable code yet or not.  "The block layer will be updated in
> future kernels to provide this more efficient mechanism of ensuring
> ordering...these future block layer improvements will change some kernel
> interfaces..."  Yikes, that does not inspire confidence to me.

XFS is naturally much more mature / stable than EXT4, but I'm not quite
sure I want to judge the stability of code based on a comment in release
notes. As I understand it, the comment says something like "things are
not working as efficiently as it should, we'll improve that in the
future" and it relates to the block layer as a whole, not just specific
file systems. But I don't have access to the bug #635199, so maybe I
missed something.

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance