For long involved reasons I'm hanging out late at work today, and rather
than doing real, productive work, I thought I'd run some benchmarks
against our development PostgreSQL database server. My conclusions are
at the end.
The purpose of the benchmarking was to find out how fast Postgres was,
or to compare Postgres to other databases, but to instead answer the
question: when does it become worthwhile to switch over to using COPYs
instead of INSERTS, and by how much? This benchmark should in no way be
used to gauge absolute performance of PostgreSQL.
The machine in question: a new HP-145 rack mount server, with a
single-socket dual-core 1.8GHz Opteron 275, 1M of cache per core, with
4G of memory, running Redhat Linux (forget which version). Database was
on the local single SATA hard disk- no raid. From the numbers, I'm
assuming the disk honors fsync. Some tuning of the database was done,
specifically shared_buffers was upped to 2500 and temp_buffers to 1500
(mental note to self: must increase these signifigantly more. Forgot
they were so low). fsync is definately on. Test program was written in
Ocaml, compiled to native code, using the Ocaml Postgresql connection
library (Ocaml bindings of the libpgsql library). The test was single
threaded- only one insert going on at a time, run over the local gigabit
ethernet network from a remote machine.
The table design was very simple:
CREATE TABLE copytest (
id SERIAL PRIMARY KEY NOT NULL,
name VARCHAR(64),
thread INT,
block INT,
num INT);
The id column was not specified either in the inserts or in the copies,
instead it just came from the sequence. Other than the id, there are no
indexes on the table. Numbers are approximate.
Results:
Inserts, 1 per transaction* 83 inserts/second
Inserts, 5 per transaction 419 inserts/second
Inserts, 10 per transaction 843 inserts/second
Inserts, 50 per transaction ~3,100 inserts/second
Inserts, 100 per transaction ~4,200 inserts/second
Inserts, 1,000 per transaction ~5,400 inserts/second
Copy, 5 element blocks ~405 inserts/second
Copy, 10 element blocks ~700 inserts/second
Copy, 50 element blocks ~3,400 inserts/second
Copy, 100 element blocks ~6,000 inserts/second
Copy, 1,000 element blocks ~20,000 inserts/second
Copy, 10,000 element blocks ~27,500 inserts/second
Copy, 100,000 element blocks ~27,600 inserts/second
* The singleton inserts were not done in an explicit begin/end block,
but were instead "unadorned" inserts.
Some conclusions:
1) Transaction time is a huge hit on the small block sizes. Going from
1 insert per transaction to 10 inserts per transaction gives a 10x speed
up. Once the block size gets large enough (10's to 100's of elements
per block) the cost of a transaction becomes less of a problem.
2) Both insert methods hit fairly hard walls of diminishing returns were
larger block sizes gave little performance advantage, tending to no
performance advantage.
3) For small enough block sizes, inserts are actually faster than
copies- but not by much. There is a broad plateau, spanning at least
the 5 through 100 elements per block (more than an order of magnitude),
where the performance of the two are roughly identical. For the general
case, I'd be inclined to switch to copies sooner (at 5 or so elements
per block) rather than later.
4) At the high end, copies vastly outperformed inserts. At 1,000
elements per block, the copy was almost 4x faster than inserts. This
widened to ~5x before copy started topping out.
5) The performance of Postgres, at least on inserts, depends critically
on how you program it. One the same hardware, performance for me varied
over a factor of over 300-fold, 2.5 orders of magnitude. Programs which
are unaware of transactions and are designed to be highly portable are
likely to hit the abysmal side of performance, where the transaction
overhead kills performance. I'm not sure there is a fix for this (let
alone an easy fix)- simply dropping transactions is obviously not it.
Programs that are transaction aware and willing to use
PostgreSQL-specific features can get surprisingly excellent
performance. Simply being transaction-aware and doing multiple inserts
per transaction greatly increases performance, giving an easy order of
magnitude increase (wrapping 10 inserts in a transaction gives a 10x
performance boost).
Brian