On Thu, Feb 2, 2012 at 9:28 AM, Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> wrote: > I created a table with two columns: an id SERIAL (primary key) and a > text (not null), and then added a unique index on the text field. > Then I ran the following query (with a huge work_mem - 20GB): > > insert into tableA (text_field) select distinct other_text_field from > some_huge_set_of_tables I bet the distinct is being implemented by a hashAggregate. So then you are inserting the records in a random order, causing the index to have terrible locality of reference. Try adding "order by other_text_field" to the select. Or don't create the index until afterwards > > After 36 hours it had only written 3 GB (determined by looking at what > files it was writing to). > I started over with a TRUNCATE, and then removed the index and tried again. > This time it took 3807270.780 ms (a bit over an hour). > Total number of records: approx 227 million, comprising 16GB of storage. > > Why the huge discrepancy? Maintaining indices when rows are inserted in a random order generates a huge amount of scattered I/O. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance