On Wed, 27 Aug 2008 10:32:43 +0200 Ivan Sergio Borgonovo <mail@xxxxxxxxxxxxxxx> wrote: > create or replace function UpdateAuthors() > returns void > as > $$ > declare > _row record; > _ItemID bigint; > _Authors varchar(1024); > _Name varchar(50); > begin > _Authors:=''; > _ItemID:=null; > for _row in select a.Name, ia.ItemID from {catalog_itemauthor} ia > join {catalog_author} a on a.AuthorID=ia.AuthorID > order by ia.ItemID > loop > if(_row.ItemID<>_ItemID) then > if(length(_Authors)>2) then > _Authors:=substring(_Authors from 3); > update {catalog_items} set Authors=_Authors > where ItemID=_ItemID; > end if; > _Authors:=''; > end if; > _ItemID:=_row.ItemID; > _Name:=trim(' \t' from _row.Name); > if(length(_Name)>0) then > _Authors:= _Authors || ', ' || _Name; > end if; > end loop; > return; > end; > $$ language plpgsql volatile; Replying to myself: update catalog_items set Authors=s.Authors from ( select ia.ItemID, array_to_string(array_accum(trim(' \t]' from a.Name)),', ') as Authors from catalog_itemauthor ia join catalog_author a on a.AuthorID=ia.AuthorID where a.Name is not null and length(trim(' \t' from a.Name))>1 group by ia.ItemID) as s where s.ItemID=catalog_items.ItemID; but this looks much slower than the function: function: 113sec vs. single statement: 488sec I repeated the test 3 times with similar results. Can anybody explain why aggregates under perform so badly? I just read that most of the times I can't be smarter than the planner and I thought that this would be one of the circumstances theplanner could outperform my handwritten function. here is the explain: "Hash Join (cost=137408.51..271599.78 rows=209674 width=221)" " Hash Cond: ("outer".itemid = "inner".itemid)" " -> HashAggregate (cost=32994.81..36664.11 rows=209674 width=58)" " -> Hash Join (cost=8544.62..31946.44 rows=209674 width=58)" " Hash Cond: ("outer".authorid = "inner".authorid)" " -> Seq Scan on catalog_itemauthor ia (cost=0.00..10297.21 rows=629021 width=12)" " -> Hash (cost=8309.00..8309.00 rows=94248 width=54)" " -> Seq Scan on catalog_author a (cost=0.00..8309.00 rows=94248 width=54)" " Filter: ((name IS NOT NULL) AND (length(btrim((name)::text, E' \011'::text)) > 1))" " -> Hash (cost=79538.96..79538.96 rows=833496 width=189)" " -> Seq Scan on catalog_items (cost=0.00..79538.96 rows=833496 width=189)" thanks to Pavel who suggested the use of array_to_string Sam... I did your same error the first time I wrote the above statement... missing the where clause but I can't understand the need for your longer version "to ensure that characters trimmed from the authors' name are consistent. I prefer to put constraint on the length of varchar as an early warning for something that is unexpected. eg. if the loop was not working as expected I'd get an error. Infact that's exactly what happened during development of the above function. If Authors was a text field I'd be waiting forever, fill the DB with rubbish etc... There are not unlimited fields... there are fields where the upper limit may be hard to guess. 1024 is actually an inherited choice... but I did some research to find a reasonable upper limit (highest # of authors * longest author). thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it