Some performance numbers, with thoughts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




For long involved reasons I'm hanging out late at work today, and rather than doing real, productive work, I thought I'd run some benchmarks against our development PostgreSQL database server. My conclusions are at the end.

The purpose of the benchmarking was to find out how fast Postgres was, or to compare Postgres to other databases, but to instead answer the question: when does it become worthwhile to switch over to using COPYs instead of INSERTS, and by how much? This benchmark should in no way be used to gauge absolute performance of PostgreSQL.

The machine in question: a new HP-145 rack mount server, with a single-socket dual-core 1.8GHz Opteron 275, 1M of cache per core, with 4G of memory, running Redhat Linux (forget which version). Database was on the local single SATA hard disk- no raid. From the numbers, I'm assuming the disk honors fsync. Some tuning of the database was done, specifically shared_buffers was upped to 2500 and temp_buffers to 1500 (mental note to self: must increase these signifigantly more. Forgot they were so low). fsync is definately on. Test program was written in Ocaml, compiled to native code, using the Ocaml Postgresql connection library (Ocaml bindings of the libpgsql library). The test was single threaded- only one insert going on at a time, run over the local gigabit ethernet network from a remote machine.

The table design was very simple:
CREATE TABLE copytest (
   id SERIAL PRIMARY KEY NOT NULL,
   name VARCHAR(64),
   thread INT,
   block INT,
   num INT);

The id column was not specified either in the inserts or in the copies, instead it just came from the sequence. Other than the id, there are no indexes on the table. Numbers are approximate.

Results:

Inserts, 1 per transaction*                    83 inserts/second
Inserts, 5 per transaction 419 inserts/second Inserts, 10 per transaction 843 inserts/second
Inserts, 50 per transaction                  ~3,100 inserts/second
Inserts, 100 per transaction                ~4,200 inserts/second
Inserts, 1,000 per transaction             ~5,400 inserts/second
Copy, 5 element blocks                     ~405 inserts/second
Copy, 10 element blocks                   ~700 inserts/second
Copy, 50 element blocks                   ~3,400 inserts/second
Copy, 100 element blocks                 ~6,000 inserts/second
Copy, 1,000 element blocks              ~20,000 inserts/second
Copy, 10,000 element blocks            ~27,500 inserts/second
Copy, 100,000 element blocks          ~27,600 inserts/second

* The singleton inserts were not done in an explicit begin/end block, but were instead "unadorned" inserts.

Some conclusions:

1) Transaction time is a huge hit on the small block sizes. Going from 1 insert per transaction to 10 inserts per transaction gives a 10x speed up. Once the block size gets large enough (10's to 100's of elements per block) the cost of a transaction becomes less of a problem. 2) Both insert methods hit fairly hard walls of diminishing returns were larger block sizes gave little performance advantage, tending to no performance advantage.

3) For small enough block sizes, inserts are actually faster than copies- but not by much. There is a broad plateau, spanning at least the 5 through 100 elements per block (more than an order of magnitude), where the performance of the two are roughly identical. For the general case, I'd be inclined to switch to copies sooner (at 5 or so elements per block) rather than later.

4) At the high end, copies vastly outperformed inserts. At 1,000 elements per block, the copy was almost 4x faster than inserts. This widened to ~5x before copy started topping out.

5) The performance of Postgres, at least on inserts, depends critically on how you program it. One the same hardware, performance for me varied over a factor of over 300-fold, 2.5 orders of magnitude. Programs which are unaware of transactions and are designed to be highly portable are likely to hit the abysmal side of performance, where the transaction overhead kills performance. I'm not sure there is a fix for this (let alone an easy fix)- simply dropping transactions is obviously not it. Programs that are transaction aware and willing to use PostgreSQL-specific features can get surprisingly excellent performance. Simply being transaction-aware and doing multiple inserts per transaction greatly increases performance, giving an easy order of magnitude increase (wrapping 10 inserts in a transaction gives a 10x performance boost).

Brian




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux