Re: Can anyone explain this pgbench results?

"Joost Kraaijeveld" <J.Kraaijeveld@xxxxxxxxxx> · Tue, 7 Mar 2006 11:34:18 +0100

Hi Michael,

Michael Fuhr wrote:
>>> Have you tweaked postgresql.conf at all?  If so, what non-default
>>> settings are you using?
>> 
>> Yes, I have tweaked the following settings:
>> 
>> shared_buffers = 40000
>> work_mem = 512000
>> maintenance_work_mem = 512000
>> max_fsm_pages = 40000
>> effective_cache_size = 131072
> 
> Are you sure you need work_mem that high?  How did you decide on
> that value? 
I have used http://www.powerpostgresql.com/Downloads/annotated_conf_80.html , expecting that the differences between 8.0 and 8.1 do not invalidate the recommendations. I have checked with (some) of my (large) queries and adjusted upward untill I had no temp files in the PGDATA/base/DB_OID/pgsql_tmp. (The warning about

> Are all other settings at their defaults?  
Yep.

> No changes to the write ahead log (WAL) or background writer (bgwriter) settings?
No, because the forementioned document explicitely states that it has recomendations on these subjects.

> What version of PostgreSQL are you running?  The paths in your
> original message suggest 8.1.x.
Debian's Ecth 8.1.0-3

> A checkpoint updates the database files with the data from the
> write-ahead log; you're seeing those writes to the database partition.
> The postmaster does checkpoints every checkpoint_timeout seconds
> (default 300) or every checkpoint_segment log segments (default 3);
> it also uses a background writer to trickle pages to the database
> files between checkpoints so the checkpoints don't have as much
> work to do.  I've been wondering if your pgbench runs are being
> affected by that background activity; the fact that you get
> consistently good performance after forcing a checkpoint suggests
> that that might be the case. 
OK, thanks. 

To be sure if I understand it correctly:

1. Every update/insert is first written to a WAL log file which is in the PGDATA/pg_xlog directory. 
2. Routinely the background writer than writes the changes to the PGDATA/base/DB_OID/ directory.
2. Postmaster forces after 300 secs or if the log segments are full (which ever comes first?) a checkpoint so that the WAL log file are empty ( I assume that that are the changes the background writer has not written yet since the last forced checkpont?).

> If you run pgbench several times without intervening checkpoints,
> do your postmaster logs have any messages like "checkpoints are
> occurring too frequently"?  It might be useful to increase
> checkpoint_warning up to the value of checkpoint_timeout and then
> see if you get any such messages during pgbench runs.  If checkpoints
> are happening a lot more often than every checkpoint_timeout seconds
> then try increasing checkpoint_segments (assuming you have the disk
> space).  After doing so, restart the database and run pgbench several
> times without intervening checkpoints and see if performance is
> more consistent.
I will try that this day.

> Note that tuning PostgreSQL for pgbench performance might be
> irrelevant for your actual needs unless your usage patterns happen
> to resemble what pgbench does.

The advantage of using pgbench is a repeatable short command that leads to something that is showing in actual real world usage.

My problem is with the raw performance of my disk array (3Ware 9500S-8 SATA RAID5 controller with 5 disks). I am having *very* serious performance problems if I do large updates on my databases. E.g. an update of 1 (boolean) column in a table (update prototype.customers set deleted = false) that has 368915 records last forever (> 3500 secs ). The only noticable disk activity during such an update is on the disk/partition that has the PGDATA/base/DB_OID/ directory (/dev/sdc, the 3Ware 9800S-8 RAID 5 array). There is *no* noticable disk activity on the disk/partition that hase the PGDATA/pg_xlog directory (/dev/sdb, on a Sil 3114 on-board SAT controller). The throughtput during the update is ~ 2 MB/sec. The thoughtput during a large file copy or running bonnie (a benchmark) is > 40 MB/sec. My primary goal is to understand the differences ( and than sue the guilty ones ;-)), and than maybe either learn to live with it or find a solution. The number of write operations/sec during the update is ~ 2000 /sec. I suspect that the RAID card cannot handle a lot of small write operations (with fsync?) in a short time without performance penalty (and yes, the write cache on the controller is enabled).

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@xxxxxxxxxx
web: www.askesis.nl