Re: UPDATEDs slowing SELECTs in a fully cached database

"ktm@xxxxxxxx" <ktm@xxxxxxxx> · Mon, 11 Jul 2011 08:13:48 -0500

Hi Lars,

I do not know if this makes sense in PostgreSQL and that readers
do not block writers and writes do not block readers. Are your
UPDATEs to individual rows, each in a separate transaction, or
do you UPDATE multiple rows in the same transaction? If you
perform multiple updates in a single transaction, you are
synchronizing the changes to that set of rows and that constraint
is causing other readers that need to get the correct values post-
transaction to wait until the COMMIT completes. This means that
the WAL write must be completed.

Have you tried disabling synchronous_commit? If this scenario
holds, you should be able to reduce the slowdown by un-batching
your UPDATEs, as counter-intuitive as that is. This seems to
be similar to a problem that I have been looking at with using
PostgreSQL as the backend to a Bayesian engine. I am following
this thread with interest.

Regards,
Ken

On Thu, Jul 07, 2011 at 04:56:13PM -0700, lars wrote:
> I am doing some research that will hopefully lead to replacing a big
> Oracle installation with a set PostgreSQL servers.
> 
> The current Oracle installations consists of multiple of RAC
> clusters with 8 RAC nodes each. Each RAC node has 256gb of
> memory (to be doubled soon).
> The nature of our service is such that over a reasonable time (a day
> or so) the database *is* the working set.
> 
> So I am looking at Postgres in a context where (almost) all of the
> data is cached and disk IO is only required for persistence.
> 
> Setup:
> PostgreSQL 9.1beta2 on a high memory (~68gb, 12 cores) EC2 Linux
> instance (kernel 2.6.35) with the database and
> WAL residing on the same EBS volume with EXT4 (data=ordered,
> barriers=1) - yes that is not an ideal setup
> (WAL should be on separate drive, EBS is slow to begin, etc), but I
> am mostly interested in read performance for a fully cached
> database.
> 
> shared_buffers: varied between 1gb and 20gb
> checkpoint_segments/timeout: varied accordingly between 16-256 and
> 5-10m, resp.
> bgwriter tweaked to get a good distribution of checkpoints,
> bg-writes, and backend writes.
> wal_sync_method: tried fdatasync and open_datasync.
> 
> I read "PostgreSQL 9.0 high performance", and have spent some
> significant amount of time on this already.
> 
> PostgreSQL holds up extremely well, once things like "storing
> hint-bits", checkpoints vs bgwriter vs backend_writes, etc
> are understood. I installed pg_buffercache and pgfincore to monitor
> how and where the database is stored.
> 
> There is one observation that I wasn't able to explain:
> A SELECT only client is severely slowed down by a concurrent client
> performing UPDATES on the same table the other
> client selects from, even when the database resides 100% in the
> cache (I tried with shared_buffers large enough to hold
> the database, and also with a smaller setting relying on the OS
> cache, the behavior is the same).
> 
> As long as only the reader is running I get great performance
> (20-30ms, query reading a random set of about 10000 rows
> out of 100m row table in a single SELECT). The backend is close to
> 100% cpu, which is what want in a cached database.
> 
> Once the writer starts the read performance drops almost immediately
> to >200ms.
> The reading backend's cpu drop drop to <10%,  and is mostly waiting
> (D state in top).
> The UPDATE touches a random set of also about 10000 rows (in one
> update statement, one of the columns touched is
> indexed - and that is the same index used for the SELECTs).
> 
> What I would have expected is that the SELECTs would just continue
> to read from the cached buffers (whether dirtied
> or not) and not be affected by concurrent updates. I could not find
> anything explaining this.
> 
> The most interesting part:
> that this does not happen with an exact clone of that relation but
> UNLOGGED. The same amount of buffers get dirty,
> the same amount checkpointing, bgwriting, vacuuming. The only
> difference is WAL maintenance as far as I can tell.
> 
> Is there some (intentional or not) synchronization between backend
> when the WAL is maintained? Are there times when
> read only query needs to compete disk IO when everything is cached?
> Or are there any other explanations?
> 
> I am happy to provide more information. Although I am mainly looking
> for a qualitative answer, which could explain this behavior.
> 
> Thanks.
> 
> -- Lars
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance