Re: Database slowness -- my design, hardware, or both?

"Martin Gainty" <mgainty@xxxxxxxxxxx> · Sun, 11 Mar 2007 11:50:24 -0400



Reuven--

a few years back this same scenario happened for me working on a project with a large Oracle DB which held clinical data 
We needed to implement a join condition but the column we were using wa not using an Index scan 
so since the cardinality of the results was extraordinary we re-implemented the column to use EXISTS (for large result sets)
*following the omnipresent example used by Oracle books where IN is used for columns with low cardinality *
and following the docs from EXISTS where the EXISTS clause ALWAYS has to return something 

In our case we were joining on a doctorid with IN (which made no sense as there were millions of PatientIDs)  to find all patients whose doctors
were in PPO's so to increase performance we changed the IN clause for the column with high cardinality (doctorid) to EXISTS

select p.PATIENT_NAME from PATIENT p where p.doctorid in (select doctorid from ppo_table)

/******there are many doctorids in this ppo table so we will change to EXISTS********/

select p.PATIENT_NAME from PATIENT p
    where exists (select 0 from  ppo_table ppo where p.doctorID = ppo.doctorID); 

Shalom/
Martin--
--------------------------------------------------------------------------- 
This e-mail message (including attachments, if any) is intended for the use of the individual or entity to which it is addressed and may contain information that is privileged, proprietary , confidential and exempt from disclosure. If you are not the intended recipient, you are notified that any dissemination, distribution or copying of this communication is strictly prohibited.
--------------------------------------------------------------------------- 
Le présent message électronique (y compris les pièces qui y sont annexées, le cas échéant) s'adresse au destinataire indiqué et peut contenir des renseignements de caractère privé ou confidentiel. Si vous n'êtes pas le destinataire de ce document, nous vous signalons qu'il est strictement interdit de le diffuser, de le distribuer ou de le reproduire.
----- Original Message ----- 
From: "Alvaro Herrera" <alvherre@xxxxxxxxxxxxxxxxx>
To: "Reuven M. Lerner" <reuven@xxxxxxxxxxxx>
Cc: "Webb Sprague" <webb.sprague@xxxxxxxxx>; <pgsql-general@xxxxxxxxxxxxxx>
Sent: Tuesday, March 06, 2007 4:19 PM
Subject: Re: [GENERAL] Database slowness -- my design, hardware, or both?


> Reuven M. Lerner escribió:
>> Hi, Webb Sprague.  You wrote:
>> >... but I see two seq scans in your explain in a loop -- this is
>> >probably not good.  If you can find a way to rewrite the IN clause
>> >(either de-normalizing through triggers to save whatever you need on
>> >an insert and not have to deal with a set, or by using except in the
>> >query, or someing else more creative)...
>> I would normally agree that an IN clause is a place to worry -- except 
>> that I'm using IN on a very small table, with about 4-5 rows.  That 
>> might indeed affect things, and I could certainly pull out these values 
>> into a Perl variable that I then interpolate literally into the SQL.  
>> However, I have to assume that this wouldn't affect things all that much.
> 
> Don't assume -- measure.  I had a query which ran orders of magnitude
> faster because I interpolated the constant list in the big query.  The
> table from which the interpolated values were being extracted had about
> 30 rows or so.
> 
> -- 
> Alvaro Herrera                                http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>