Heikki,
Thanks for the response.
Heikki Linnakangas wrote:
Vladimir Stankovic wrote:
I'm running write-intensive, TPC-C like tests. The workload consist
of 150 to 200 thousand transactions. The performance varies
dramatically, between 5 and more than 9 hours (I don't have the exact
figure for the longest experiment). Initially the server is
relatively fast. It finishes the first batch of 50k transactions in
an hour. This is probably due to the fact that the database is
RAM-resident during this interval. As soon as the database grows
bigger than the RAM the performance, not surprisingly, degrades,
because of the slow disks.
My problem is that the performance is rather variable, and to me
non-deterministic. A 150k test can finish in approx. 3h30mins but
conversely it can take more than 5h to complete.
Preferably I would like to see *steady-state* performance (where my
interpretation of the steady-state is that the average
throughput/response time does not change over time). Is the
steady-state achievable despite the MVCC and the inherent
non-determinism between experiments? What could be the reasons for
the variable performance?
Steadiness is a relative; you'll never achieve perfectly steady
performance where every transaction takes exactly X milliseconds. That
said, PostgreSQL is not as steady as many other DBMS's by nature,
because of the need to vacuum. Another significant source of
unsteadiness is checkpoints, though it's not as bad with fsync=off,
like you're running.
What I am hoping to see is NOT the same value for all the executions of
the same type of transaction (after some transient period). Instead, I'd
like to see that if I take appropriately-sized set of transactions I
will see at least steady-growth in transaction average times, if not
exactly the same average. Each chunk would possibly include sudden
performance drop due to the necessary vacuum and checkpoints. The
performance might be influenced by the change in the data set too.
I am unhappy about the fact that durations of experiments can differ
even 30% (having in mind that they are not exactly the same due to the
non-determinism on the client side) . I would like to eliminate this
variability. Are my expectations reasonable? What could be the cause(s)
of this variability?
I'd suggest using the vacuum_cost_delay to throttle vacuums so that
they don't disturb other transactions as much. You might also want to
set up manual vacuums for the bigger tables, instead of relying on
autovacuum, because until the recent changes in CVS head, autovacuum
can only vacuum one table at a time, and while it's vacuuming a big
table, the smaller heavily-updated tables are neglected.
The database server version is 8.1.5 running on Fedora Core 6.
How about upgrading to 8.2? You might also want to experiment with CVS
HEAD to get the autovacuum improvements, as well as a bunch of other
performance improvements.
I will try these, but as I said my primary goal is to have
steady/'predictable' performance, not necessarily to obtain the fastest
PG results.
Best regards,
Vladimir
--
Vladimir Stankovic T: +44 20 7040 0273
Research Student/Research Assistant F: +44 20 7040 8585
Centre for Software Reliability E: V.Stankovic@xxxxxxxxxx
City University
Northampton Square, London EC1V 0HB