UPDATEDs slowing SELECTs in a fully cached database

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am doing some research that will hopefully lead to replacing a big Oracle installation with a set PostgreSQL servers.

The current Oracle installations consists of multiple of RAC clusters with 8 RAC nodes each. Each RAC node has 256gb of
memory (to be doubled soon).
The nature of our service is such that over a reasonable time (a day or so) the database *is* the working set.

So I am looking at Postgres in a context where (almost) all of the data is cached and disk IO is only required for persistence.

Setup:
PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux instance (kernel 2.6.35) with the database and WAL residing on the same EBS volume with EXT4 (data=ordered, barriers=1) - yes that is not an ideal setup (WAL should be on separate drive, EBS is slow to begin, etc), but I am mostly interested in read performance for a fully cached database.

shared_buffers: varied between 1gb and 20gb
checkpoint_segments/timeout: varied accordingly between 16-256 and 5-10m, resp. bgwriter tweaked to get a good distribution of checkpoints, bg-writes, and backend writes.
wal_sync_method: tried fdatasync and open_datasync.

I read "PostgreSQL 9.0 high performance", and have spent some significant amount of time on this already.

PostgreSQL holds up extremely well, once things like "storing hint-bits", checkpoints vs bgwriter vs backend_writes, etc are understood. I installed pg_buffercache and pgfincore to monitor how and where the database is stored.

There is one observation that I wasn't able to explain:
A SELECT only client is severely slowed down by a concurrent client performing UPDATES on the same table the other client selects from, even when the database resides 100% in the cache (I tried with shared_buffers large enough to hold the database, and also with a smaller setting relying on the OS cache, the behavior is the same).

As long as only the reader is running I get great performance (20-30ms, query reading a random set of about 10000 rows out of 100m row table in a single SELECT). The backend is close to 100% cpu, which is what want in a cached database.

Once the writer starts the read performance drops almost immediately to >200ms. The reading backend's cpu drop drop to <10%, and is mostly waiting (D state in top). The UPDATE touches a random set of also about 10000 rows (in one update statement, one of the columns touched is
indexed - and that is the same index used for the SELECTs).

What I would have expected is that the SELECTs would just continue to read from the cached buffers (whether dirtied or not) and not be affected by concurrent updates. I could not find anything explaining this.

The most interesting part:
that this does not happen with an exact clone of that relation but UNLOGGED. The same amount of buffers get dirty, the same amount checkpointing, bgwriting, vacuuming. The only difference is WAL maintenance as far as I can tell.

Is there some (intentional or not) synchronization between backend when the WAL is maintained? Are there times when read only query needs to compete disk IO when everything is cached? Or are there any other explanations?

I am happy to provide more information. Although I am mainly looking for a qualitative answer, which could explain this behavior.

Thanks.

-- Lars


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux