Re: How to "unique-ify" HUGE table?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 23 Dec 2008 12:25:48 -0500
"Kynn Jones" <kynnjo@xxxxxxxxx> wrote:
> Hi everyone!
> I have a very large 2-column table (about 500M records) from which I want to
> remove duplicate records.
> 
> I have tried many approaches, but they all take forever.
> 
> The table's definition consists of two short TEXT columns.  It is a
> temporary table generated from a query:
> 
> CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ;
> 
> Initially I tried
> 
>  CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ;
> 
> but after waiting for nearly an hour I aborted the query, and repeated it

Do you have an index on x and y?  Also, does this work better?

CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... GROUP BY x, y;

What does ANALYZE EXPLAIN have to say?

-- 
D'Arcy J.M. Cain <darcy@xxxxxxxxx>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux