I have since moved the WAL to its own EBS volume (ext4, data=writeback)
to make it easier to monitor IO.
The times where the SELECTs slow down coincide with heavy write traffic
to the WAL volume.
Maybe this has to do with WALInsertLock or WALWriteLock (or some other
lock).
Since the slowdown was less severe with WAL on its own volume it seems
some exclusive lock on the pages in
shared_buffers is held while WAL IO is in progres(?) - that would be
"frustrating". (wal_buffers default to 16mb in my setup)
Next I am going to have a look at the code. I would be thankful for any
further insights, though :)
Thanks.
-- Lars
On 07/07/2011 04:56 PM, lars wrote:
I am doing some research that will hopefully lead to replacing a big
Oracle installation with a set PostgreSQL servers.
The current Oracle installations consists of multiple of RAC clusters
with 8 RAC nodes each. Each RAC node has 256gb of
memory (to be doubled soon).
The nature of our service is such that over a reasonable time (a day
or so) the database *is* the working set.
So I am looking at Postgres in a context where (almost) all of the
data is cached and disk IO is only required for persistence.
Setup:
PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux
instance (kernel 2.6.35) with the database and
WAL residing on the same EBS volume with EXT4 (data=ordered,
barriers=1) - yes that is not an ideal setup
(WAL should be on separate drive, EBS is slow to begin, etc), but I am
mostly interested in read performance for a fully cached database.
shared_buffers: varied between 1gb and 20gb
checkpoint_segments/timeout: varied accordingly between 16-256 and
5-10m, resp.
bgwriter tweaked to get a good distribution of checkpoints, bg-writes,
and backend writes.
wal_sync_method: tried fdatasync and open_datasync.
I read "PostgreSQL 9.0 high performance", and have spent some
significant amount of time on this already.
PostgreSQL holds up extremely well, once things like "storing
hint-bits", checkpoints vs bgwriter vs backend_writes, etc
are understood. I installed pg_buffercache and pgfincore to monitor
how and where the database is stored.
There is one observation that I wasn't able to explain:
A SELECT only client is severely slowed down by a concurrent client
performing UPDATES on the same table the other
client selects from, even when the database resides 100% in the cache
(I tried with shared_buffers large enough to hold
the database, and also with a smaller setting relying on the OS cache,
the behavior is the same).
As long as only the reader is running I get great performance
(20-30ms, query reading a random set of about 10000 rows
out of 100m row table in a single SELECT). The backend is close to
100% cpu, which is what want in a cached database.
Once the writer starts the read performance drops almost immediately
to >200ms.
The reading backend's cpu drop drop to <10%, and is mostly waiting (D
state in top).
The UPDATE touches a random set of also about 10000 rows (in one
update statement, one of the columns touched is
indexed - and that is the same index used for the SELECTs).
What I would have expected is that the SELECTs would just continue to
read from the cached buffers (whether dirtied
or not) and not be affected by concurrent updates. I could not find
anything explaining this.
The most interesting part:
that this does not happen with an exact clone of that relation but
UNLOGGED. The same amount of buffers get dirty,
the same amount checkpointing, bgwriting, vacuuming. The only
difference is WAL maintenance as far as I can tell.
Is there some (intentional or not) synchronization between backend
when the WAL is maintained? Are there times when
read only query needs to compete disk IO when everything is cached? Or
are there any other explanations?
I am happy to provide more information. Although I am mainly looking
for a qualitative answer, which could explain this behavior.
Thanks.
-- Lars
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance