On 07/11/2011 08:26 AM, Robert Klemme wrote:
On Mon, Jul 11, 2011 at 3:13 PM, ktm@xxxxxxxx<ktm@xxxxxxxx> wrote:
I do not know if this makes sense in PostgreSQL and that readers
do not block writers and writes do not block readers. Are your
UPDATEs to individual rows, each in a separate transaction, or
do you UPDATE multiple rows in the same transaction? If you
perform multiple updates in a single transaction, you are
synchronizing the changes to that set of rows and that constraint
is causing other readers that need to get the correct values post-
transaction to wait until the COMMIT completes. This means that
the WAL write must be completed.
What readers should that be? Docs explicitly state that readers are
never blocked by writers:
http://www.postgresql.org/docs/9.0/interactive/mvcc-intro.html
http://www.postgresql.org/docs/9.0/interactive/mvcc.html
From what I understand about this issue the observed effect must be
caused by the implementation and not by a conceptual issue with
transactions.
Have you tried disabling synchronous_commit? If this scenario
holds, you should be able to reduce the slowdown by un-batching
your UPDATEs, as counter-intuitive as that is. This seems to
be similar to a problem that I have been looking at with using
PostgreSQL as the backend to a Bayesian engine. I am following
this thread with interest.
I don't think this will help (see above). Also, I would be very
cautious to do this because although the client might get a faster
acknowledge the DB still has to do the same work as without
synchronous_commit (i.e. WAL, checkpointing etc.) but it still has to
do significantly more transactions than in the batched version.
Typically there is an optimum batch size: if batch size is too small
(say, one row) the ratio of TX overhead to "work" is too bad. If
batch size is too large (say, millions of rows) you hit resource
limitations (memory) which inevitable force the RDBMS to do additional
disk IO.
Kind regards
robert
Thanks Ken and Robert,
What I am observing is definitely not readers blocked by writers by
means of row-level locking.
This seems to be some implementation detail in Postgres about how dirty
pages (or dead versions of tuples) are
flushed to the disk (see the other parts of this thread) when they
accessed by a SELECT query.
The batch size in this case is one SELECT statement accessing 10000 rows
via an aggregate (such as COUNT) and
an UPDATE updating 10000 rows in a single statement.
I am not trying to optimize this particular use case, but rather to
understand what Postgres is doing, and why SELECT
queries are affected negatively (sometimes severely) by concurrent (or
even preceding) UPDATEs at all when the
database resides in the cache completely.
-- Lars
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance