Strange Create Index behaviour

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Platform: FreeBSD 6.0, Postgresql 8.1.2 compiled from the ports collection.

Not sure if this belongs in performance or bugs..

A pg_restore of my 2.5GB database was taking up to 2 hours to complete instead of the expected 10-15 minutes. Checking the server it was mostly CPU bound. Testing further I discovered that is was spending huge amounts of CPU time creating some indexes.

It took a while to find out, but basically it boils down to this:

If the column that is having the index created has a certain distribution of values then create index takes a very long time. If the data values (integer in this case) a fairly evenly distributed then create index is very quick, if the data values are all the same it is very quick. I discovered that in the slow cases the column had approximately half the values as zero and the rest fairly spread out. One column started off with around 400,000 zeros and the rest of the following rows spread between values of 1 to 500,000.

I have put together a test case that demonstrates the problem (see below). I create a simple table, as close in structure to one of my problem tables and populate an integer column with 100,000 zeros follow by 100,000 random integers between 0 and 100,000. Then create an index on this column. I then drop the table and repeat. The create index should take around 1-2 seconds. A fair proportion of the time it takes 50 seconds!!!

If I fill the same row with all random data the create index always takes a second or two. If I fill the column with all zeros everything is still OK.

When my tables that I am trying to restore are over 2 million rows the creating one index can take an hour!! (almost all CPU time).

All other areas of performance, once the dump is restored and analysed seem to be OK, even large hash/merge joins and sorts

This is entirely repeatable in FreeBSD in that around half the time create index will be incredibly slow.

All postgresql.conf settings are at the defaults for the test initially (fresh install)

The final interesting thing is that as I increase shared buffers to 2000 or 3000 the problem gets *worse*

The following text is output from the test script..

select version();
version
------------------------------------------------------------------------------------------------
PostgreSQL 8.1.2 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518
(1 row)

\timing
Timing is on.

-----  Many slow cases, note the 50+ seconds cases

create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 81.859 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 1482.141 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1543.508 ms
create index idx on atest(r);
CREATE INDEX
Time: 56685.230 ms

drop table atest;
DROP TABLE
Time: 4.616 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 6.889 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 2009.787 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1828.663 ms
create index idx on atest(r);
CREATE INDEX
Time: 3991.257 ms

drop table atest;
DROP TABLE
Time: 3.796 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 19.965 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 1625.059 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2622.827 ms
create index idx on atest(r);
CREATE INDEX
Time: 1082.799 ms

drop table atest;
DROP TABLE
Time: 4.627 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 2.953 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 2068.744 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2671.420 ms
create index idx on atest(r);
CREATE INDEX
Time: 8047.660 ms

drop table atest;
DROP TABLE
Time: 3.675 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 2.582 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 1723.987 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2263.131 ms
create index idx on atest(r);
CREATE INDEX
Time: 50050.308 ms

drop table atest;
DROP TABLE
Time: 52.744 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 25.370 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 2052.733 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2631.317 ms
create index idx on atest(r);
CREATE INDEX
Time: 61440.897 ms

drop table atest;
DROP TABLE
Time: 26.137 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 24.794 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),0,now(),now();
INSERT 0 100000
Time: 2851.977 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1553.046 ms
create index idx on atest(r);
CREATE INDEX
Time: 1774.920 ms


----  Fast (Normal?) cases

drop table atest;
DROP TABLE
Time: 4.422 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 2.543 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1516.246 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1407.400 ms
create index idx on atest(r);
CREATE INDEX
Time: 903.503 ms

drop table atest;
DROP TABLE
Time: 3.820 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 22.861 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1455.556 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 2037.996 ms
create index idx on atest(r);
CREATE INDEX
Time: 718.286 ms

drop table atest;
DROP TABLE
Time: 4.503 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 3.448 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1523.540 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1261.473 ms
create index idx on atest(r);
CREATE INDEX
Time: 727.707 ms

drop table atest;
DROP TABLE
Time: 3.564 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 2.897 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1447.504 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1403.525 ms
create index idx on atest(r);
CREATE INDEX
Time: 754.577 ms

drop table atest;
DROP TABLE
Time: 4.633 ms
create table atest(i int4, r int4,d1 timestamp, d2 timestamp);
CREATE TABLE
Time: 3.196 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1618.544 ms
insert into atest (i,r,d1,d2) select generate_series(1,100000),random()*100000,now(),now();
INSERT 0 100000
Time: 1530.450 ms
create index idx on atest(r);
CREATE INDEX
Time: 802.980 ms
drop table atest;
DROP TABLE
Time: 4.707 ms
mserver#

Regards,
Gary.


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux