Platform: FreeBSD 6.0, Postgresql 8.1.2 compiled from the ports collection.
Not sure if this belongs in performance or bugs..
A pg_restore of my 2.5GB database was taking up to 2 hours to complete
instead of the expected 10-15 minutes. Checking the server it was mostly
CPU bound. Testing further I discovered that is was spending huge
amounts of CPU time creating some indexes.
It took a while to find out, but basically it boils down to this:
If the column that is having the index created has a certain
distribution of values then create index takes a very long time. If the
data values (integer in this case) a fairly evenly distributed then
create index is very quick, if the data values are all the same it is
very quick. I discovered that in the slow cases the column had
approximately half the values as zero and the rest fairly spread out.
One column started off with around 400,000 zeros and the rest of the
following rows spread between values of 1 to 500,000.
I have put together a test case that demonstrates the problem (see
below). I create a simple table, as close in structure to one of my
problem tables and populate an integer column with 100,000 zeros follow
by 100,000 random integers between 0 and 100,000. Then create an index
on this column. I then drop the table and repeat. The create index
should take around 1-2 seconds. A fair proportion of the time it takes
50 seconds!!!
If I fill the same row with all random data the create index always
takes a second or two. If I fill the column with all zeros everything is
still OK.
When my tables that I am trying to restore are over 2 million rows the
creating one index can take an hour!! (almost all CPU time).
All other areas of performance, once the dump is restored and analysed
seem to be OK, even large hash/merge joins and sorts
This is entirely repeatable in FreeBSD in that around half the time
create index will be incredibly slow.
All postgresql.conf settings are at the defaults for the test initially
(fresh install)
The final interesting thing is that as I increase shared buffers to 2000
or 3000 the problem gets *worse*
The following text is output from the test script..
select version();
version
------------------------------------------------------------------------------------------------
PostgreSQL 8.1.2 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC)
3.4.4 [FreeBSD] 20050518
(1 row)
\timing
Timing is on.
----- Many slow cases, note the 50+ seconds cases
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 81.859 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 1482.141 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1543.508 ms
create index idx on atest(r);
CREATE INDEX
Time: 56685.230 ms
drop table atest;
DROP TABLE
Time: 4.616 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 6.889 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 2009.787 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1828.663 ms
create index idx on atest(r);
CREATE INDEX
Time: 3991.257 ms
drop table atest;
DROP TABLE
Time: 3.796 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 19.965 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 1625.059 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2622.827 ms
create index idx on atest(r);
CREATE INDEX
Time: 1082.799 ms
drop table atest;
DROP TABLE
Time: 4.627 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 2.953 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 2068.744 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2671.420 ms
create index idx on atest(r);
CREATE INDEX
Time: 8047.660 ms
drop table atest;
DROP TABLE
Time: 3.675 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 2.582 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 1723.987 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2263.131 ms
create index idx on atest(r);
CREATE INDEX
Time: 50050.308 ms
drop table atest;
DROP TABLE
Time: 52.744 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 25.370 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 2052.733 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2631.317 ms
create index idx on atest(r);
CREATE INDEX
Time: 61440.897 ms
drop table atest;
DROP TABLE
Time: 26.137 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 24.794 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 2851.977 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1553.046 ms
create index idx on atest(r);
CREATE INDEX
Time: 1774.920 ms
---- Fast (Normal?) cases
drop table atest;
DROP TABLE
Time: 4.422 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 2.543 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1516.246 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1407.400 ms
create index idx on atest(r);
CREATE INDEX
Time: 903.503 ms
drop table atest;
DROP TABLE
Time: 3.820 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 22.861 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1455.556 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2037.996 ms
create index idx on atest(r);
CREATE INDEX
Time: 718.286 ms
drop table atest;
DROP TABLE
Time: 4.503 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 3.448 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1523.540 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1261.473 ms
create index idx on atest(r);
CREATE INDEX
Time: 727.707 ms
drop table atest;
DROP TABLE
Time: 3.564 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 2.897 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1447.504 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1403.525 ms
create index idx on atest(r);
CREATE INDEX
Time: 754.577 ms
drop table atest;
DROP TABLE
Time: 4.633 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 3.196 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1618.544 ms
insert into atest (i,r,d1,d2) select
generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1530.450 ms
create index idx on atest(r);
CREATE INDEX
Time: 802.980 ms
drop table atest;
DROP TABLE
Time: 4.707 ms
mserver#
Regards,
Gary.