Re: [HACKERS] Big IN() clauses etc : feature proposal

PFC <lists@xxxxxxxxxx> · Tue, 09 May 2006 12:10:37 +0200

Additionally to your query you are already transferring the whole result  
set multiple times.  First you copy it to the result table. Then you
read it again.   Your subsequent queries will also have to read over
all the unneeded tuples just to get your primary key.

	Considering that the result set is not very large and will be cached in  
RAM, this shouldn't be a problem.

then why useth thy not the DISTINCT clause when building thy result  
table and thou shalt have no duplicates.

	Because the result table contains no duplicates ;)
	I need to remove duplicates in this type of queries :

-- get object owners info
SELECT * FROM users WHERE id IN (SELECT user_id FROM results);

	And in this case I find IN() easier to read than DISTINCT (what I posted  
was a simplification of my real use case...)

which is a perfect reason to use a temp table.  Another variation on the  
temp table scheme is use a result table and add a query_id.

	True. Doesn't solve my problem though : it's still complex, doesn't have  
good rowcount estimation, bloats a table (I only need these records for  
the duration of the transaction), etc.

We do something like this in our web application when users submit  
complex queries.  For each query we store tuples of (query_id,result_id)
in a result table.  It's then easy for the web application to page the
result set.

	Yes, that is about the only sane way to page big result sets.

A cleaner solution usually pays off in the long run whereas a hackish
or overly complex solution will bite you in the behind for sure as
time goes by.

	Yes, but in this case temp tables add too much overhead. I wish there  
were RAM based temp tables like in mysql. However I guess the current temp  
table slowness comes from the need to mark their existence in the system  
catalogs or something. That's why I proposed using cursors...