somewhat slow query with subselect

Marcus Engene <mengpg2@xxxxxxxxx> · Tue, 25 Aug 2009 18:16:51 +0200

Hi list,

I have a table, not that many rows (400k) but with about 55 columns. 
There are some maintenance selects I want to do (every now and then) 
that I don't want to add indexes for.

select
   ...
from
   quite_big_table qbt
where
   qbt.owner = 123 AND
   exists (
       select null
       from quite_big_table qbt2
       where
           qbt2.owner = qbt.owner AND
           qbt2.id <> qbt.id AND
           qbt2.filelength = qbt.filelength
   )

In a case with some 5000 rows belonging to owner 123, this select really 
takes a long time. Way longer than without the subselect and order by 
filelength. I agree that with the subselect it would take some extra 
juice, but in my mind it would do some hash in memory which wouldn't be 
too slow to lookup in.

shared_buffers = 16GB
temp_buffers = 1GB
work_mem = 32MB
maintenance_work_mem = 1GB
#server has 64GB (64bit)

I guess there is some penalty because of the size (# columns) of the 
table since it has to go thru more blocks on the disk. Would it have 
been beneficial if filelength was in a joined child table instead?

How would you do this? Create a temporary table with owner+filelenght?

Thanks!
Marcus

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general