Quoting Tom Lane <tgl@xxxxxxxxxxxxx>:
jao@xxxxxxxxxxxx writes:
I have this table and index:
create table t(id int, hash int);
create index idx_t on t(hash);
The value of the hash column, which is indexed, is a pseudo-random
number. I load the table and measure the time per insert.
What I've observed is that inserts slow down as the table grows to
1,000,000 records. Observing the pg_stat* tables, I see that the data
page reads per unit time stay steady, but that index page reads grow
quickly, (shared_buffers was set to 2000).
Define "quickly" ... the expected behavior is that cost to insert into
a btree index grows roughly as log(N). Are you seeing anything worse
than that?
No, that's not what I'm seeing. The index block reads start low, and
rise quickly to an approximate plateau. I've placed my test program
and results here: http://geophile.com/insert_slowdown.
- InsertPerformance.java: The test program (using the 8.0 JDBC driver
and a 7.4.8 database. The database and test are all running on my
laptop).
- block_reads.jpg: Graph of data and index block reads, as reported by
the pgstat_ tables, sampled every 15 seconds, (for a load of 1,000,000
rows).
- insert_rate_vs_inserts.jpg: Graph of insert rate as a function of
#rows inserted.
- insert_rate_vs_time.jpg: Graph of insert rate as a function of wall
clock time.
shared_buffers of 2000 is generally considered too small for high-volume
databases.
Understood. I set the value low to quickly test the idea that the
index cache hit rate was the issue.
Numbers like 10000-50000 are considered reasonable on modern
hardware.
These values are OK for 7.4.8? I've been using 8000. I thought I
remembered reading that 12000-15000 was the top end of what would be
reasonable, but I don't have a reference, and I don't think I've ever
heard a rationale for such limits.
Jack Orenstein