On Tue, Dec 23, 2008 at 10:25 AM, Kynn Jones <kynnjo@xxxxxxxxx> wrote: > Hi everyone! > I have a very large 2-column table (about 500M records) from which I want to > remove duplicate records. > I have tried many approaches, but they all take forever. > The table's definition consists of two short TEXT columns. It is a > temporary table generated from a query: > > CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ; > Initially I tried > CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ; > but after waiting for nearly an hour I aborted the query, and repeated it > after getting rid of the DISTINCT clause. > Everything takes forever with this monster! It's uncanny. Even printing it > out to a file takes forever, let alone creating an index for it. > Any words of wisdom on how to speed this up would be appreciated. Did you try cranking up work_mem to something that's a large percentage (25 to 50%) of total memory? -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance