Re: improving write performance for logging application

Steve Eckmann <eckmann@xxxxxxxxxxxx> · Wed, 04 Jan 2006 07:00:12 -0700

Tom Lane wrote:

  Steve Eckmann <eckmann@xxxxxxxxxxxx> writes:

    We also found that we could improve MySQL performance significantly
using MySQL's "INSERT" command extension allowing multiple value-list
tuples in a single command; the rate for MyISAM tables improved to
about 2600 objects/second. PostgreSQL doesn't support that language
extension. Using the COPY command instead of INSERT might help, but
since rows are being generated on the fly, I don't see how to use COPY
without running a separate process that reads rows from the
application and uses COPY to write to the database.

Can you conveniently alter your application to batch INSERT commands
into transactions?  Ie

	BEGIN;
	INSERT ...;
	... maybe 100 or so inserts ...
	COMMIT;
	BEGIN;
	... lather, rinse, repeat ...

This cuts down the transactional overhead quite a bit.  A downside is
that you lose multiple rows if any INSERT fails, but then the same would
be true of multiple VALUES lists per INSERT.

			regards, tom lane

Thanks for the suggestion, Tom. Yes, I think I could do that. But I
thought what I was doing now was effectively the same, because the
PostgreSQL 8.0.0 Documentation says (section 27.3.1): "It is allowed to
include multiple SQL commands (separated by semicolons) in the command
string. Multiple queries sent in a single PQexec call are processed in
a single transaction...." Our simulation application has nearly 400
event types, each of which is a C++ class for which we have a
corresponding database table. So every thousand events or so I issue
one PQexec() call for each event type that has unlogged instances,
sending INSERT commands for all instances. For example,

    PQexec(dbConn, "INSERT INTO FlyingObjectState VALUES (...); INSERT
INTO FlyingObjectState VALUES (...); ...");

My thought was that this would be a good compromise between minimizing
transactions (one per event class per buffering interval instead of one
per event) and minimizing disk seeking (since all queries in a single
transaction insert rows into the same table). Am I overlooking
something here? One thing I haven't tried is increasing the buffering
interval from 1000 events to, say, 10,000. It turns out that 1000 is a
good number for Versant, the object database system we're replacing,
and for MySQL, so I assumed it would be a good number for PostgreSQL,
too.

Regards,  Steve