Re: CREATE INDEX rather sluggish

Simon Riggs <simon@xxxxxxxxxxxxxxx> · Thu, 30 Mar 2006 18:08:44 +0100

On Thu, 2006-03-30 at 09:26 +0100, Gavin Hamill wrote: 
> Hullo, I have pg 8.1.3 on an 8-CPU AIX 5.3 box with 16GB of RAM, and I'm 
> finding that it's taking an age to CREATE INDEX on a large table:
> 
>     Column     |          Type          |                              Modifiers
> ----------------+------------------------+---------------------------------------------------------------------
>  ID             | integer                | not null default nextval(('public.keyword_id_seq'::text)::regclass)
>  Text           | character varying(200) |
>  Longitude      | numeric(16,5)          |
>  Latitude       | numeric(16,5)          |
>  AreaID         | integer                |
>  SearchCount    | integer                | not null default 0
>  Radius         | integer                |
>  LanguageID     | integer                |
>  KeywordType    | character varying(20)  |
>  LowerText      | character varying(200) |
>  NumberOfHotels | integer                |
>  CountryID      | integer                |
>  FriendlyText   | character varying(200) |
> Indexes:
> 
> 
> 2006-03-29 21:39:38 BST LOG:  duration: 41411.625 ms  statement: CREATE INDEX ix_keyword_areaid ON "Keyword" USING btree ("AreaID");
> 2006-03-29 21:42:46 BST LOG:  duration: 188550.644 ms  statement: CREATE INDEX ix_keyword_lonlat ON "Keyword" USING btree ("Longitude", "Latitude");
> 2006-03-29 21:46:41 BST LOG:  duration: 234864.571 ms  statement: CREATE INDEX ix_keyword_lowertext ON "Keyword" USING btree ("LowerText");
> 2006-03-29 21:52:32 BST LOG:  duration: 350757.565 ms  statement: CREATE INDEX ix_keyword_type ON "Keyword" USING btree ("KeywordType");
> 
> The table has just under six million rows - should it really be taking 
> nearly six minutes to add an index? These log snippets were taking 
> during a pg_restore on a newly created db, so there should be no issues 
> with the table needing vacuuming.

The index build time varies according to the number and type of the
datatypes, as well as the distribution of values in the table. As well
as the number of rows in the table.

Note the x10 factor to index AreaID (integer) v KeywordType (vchar(20))

> What parameters in the postgresql.conf are pertinent here? I have
> 
> shared_buffers 120000
> work_mem 16384
> maintenance_work_mem = 262144

Try trace_sort = on and then rerun the index builds to see what's
happening there. We've speeded sort up by about 2.5 times in the current
development version, but it does just run in single threaded mode so
your 8 CPUs aren't helping there.

Looks like you might be just over the maintenance_work_mem limit for the
last index builds. You can try doubling maintenance_work_mem.

The extended runtime for KeywordType is interesting in comparison to
LowerText, which on the face of it is a longer column. My guess would be
that LowerText is fairly unique and sorts quickly, whereas KeywordType
is fairly non-unique with a high average row length that require
complete string comparison before deciding it is actually the same
value. You might want to try using codes rather than textual
KeywordTypes. 

You might try using partial indexes also, along the lines of

CREATE INDEX ix_keyword_type ON "Keyword" USING btree ("KeywordType") WHERE KeywordType IS NOT NULL;

Best Regards, Simon Riggs