UPDATEDs slowing SELECTs in a fully cached database

lars <lhofhansl@xxxxxxxxx> · Thu, 07 Jul 2011 16:56:13 -0700

I am doing some research that will hopefully lead to replacing a big 
Oracle installation with a set PostgreSQL servers.

The current Oracle installations consists of multiple of RAC clusters 
with 8 RAC nodes each. Each RAC node has 256gb of
memory (to be doubled soon).
The nature of our service is such that over a reasonable time (a day or 
so) the database *is* the working set.

So I am looking at Postgres in a context where (almost) all of the data 
is cached and disk IO is only required for persistence.

Setup:
PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux 
instance (kernel 2.6.35) with the database and
WAL residing on the same EBS volume with EXT4 (data=ordered, barriers=1) 
- yes that is not an ideal setup
(WAL should be on separate drive, EBS is slow to begin, etc), but I am 
mostly interested in read performance for a fully cached database.

shared_buffers: varied between 1gb and 20gb
checkpoint_segments/timeout: varied accordingly between 16-256 and 
5-10m, resp.
bgwriter tweaked to get a good distribution of checkpoints, bg-writes, 
and backend writes.
wal_sync_method: tried fdatasync and open_datasync.

I read "PostgreSQL 9.0 high performance", and have spent some 
significant amount of time on this already.

PostgreSQL holds up extremely well, once things like "storing 
hint-bits", checkpoints vs bgwriter vs backend_writes, etc
are understood. I installed pg_buffercache and pgfincore to monitor how 
and where the database is stored.

There is one observation that I wasn't able to explain:
A SELECT only client is severely slowed down by a concurrent client 
performing UPDATES on the same table the other
client selects from, even when the database resides 100% in the cache (I 
tried with shared_buffers large enough to hold
the database, and also with a smaller setting relying on the OS cache, 
the behavior is the same).

As long as only the reader is running I get great performance (20-30ms, 
query reading a random set of about 10000 rows
out of 100m row table in a single SELECT). The backend is close to 100% 
cpu, which is what want in a cached database.

Once the writer starts the read performance drops almost immediately to 
>200ms.
The reading backend's cpu drop drop to <10%,  and is mostly waiting (D 
state in top).
The UPDATE touches a random set of also about 10000 rows (in one update 
statement, one of the columns touched is
indexed - and that is the same index used for the SELECTs).

What I would have expected is that the SELECTs would just continue to 
read from the cached buffers (whether dirtied
or not) and not be affected by concurrent updates. I could not find 
anything explaining this.

The most interesting part:
that this does not happen with an exact clone of that relation but 
UNLOGGED. The same amount of buffers get dirty,
the same amount checkpointing, bgwriting, vacuuming. The only difference 
is WAL maintenance as far as I can tell.

Is there some (intentional or not) synchronization between backend when 
the WAL is maintained? Are there times when
read only query needs to compete disk IO when everything is cached? Or 
are there any other explanations?

I am happy to provide more information. Although I am mainly looking for 
a qualitative answer, which could explain this behavior.

Thanks.

-- Lars

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance