Re: "SELECT .. WHERE NOT IN" query running for hours

Γιωργος Βαλκανας <lebiathan@xxxxxxxxx> · Fri, 7 Jan 2011 11:29:32 +0200

Fair enough!

I also turned seqscan off, so the new plan (for the NOT EXISTS) is:

Merge Anti Join  (cost=0.00..212686.89 rows=1 width=313) (actual time=0.426..14921.344 rows=63836 loops=1)

 Merge Cond: ((d2.hwdocid)::text = (d.hwdocid)::text)
 ->  Index Scan using hwdocid2_uniq on "Doc2" d2  (cost=0.00..19442.87 rows=96454 width=313) (actual time=0.130..1248.783 rows=96454 loops=1)
 ->  Index Scan using hwdocid_uniq on "Document" d  (cost=0.00..189665.17 rows=949272 width=12) (actual time=0.085..11158.740 rows=948336 loops=1)

Total runtime: 15062.925 ms

Hmm.. doesn't really seem to be such a great boost on performance. But i guess I'll be sticking to this one.

So my follow-up question on the subject is this:

Are there any particular semantics for the "NOT IN" statement that cause the correlated query to execute for every row of the outter query, as opposed to the "NOT EXISTS" ? Or are there any other practical reasons, related to "IN / NOT IN", for this to be happening? Or is it simply due to implementation details of each RDBMS? I guess the former (or the 2nd one), since, as you say, this is common in most databases, but I would most appreciate an answer to clarify this.

Thanks again!

Best regards,
George

2011/1/7 Mladen Gogala <mladen.gogala@xxxxxxxxxxx>

On 1/6/2011 9:36 PM, Γιωργος Βαλκανας wrote:

1) Why is it taking *so* long for the first query (with the "NOT IN" ) to do even the simple select?

Because NOT IN has to execute the correlated subquery for every row and then check whether the requested value is in the result set, usually by doing sequential comparison. The NOT EXIST plan is also bad because there is no index but at least it can use very fast and efficient hash algorithm. Indexing the "hwdocid" column on the "Document" table or, ideally, making it a primary key, should provide an additional boost to your query. If you already do have an index, you may consider using enable_seqscan=false for this session, so that the "hwdocid" index will be used. It's a common wisdom that in the most cases NOT EXISTS will beat NOT IN. That is so all over the database world. I've seen that in Oracle applications, MS SQL applications and, of course MySQL applications. Optimizing queries is far from trivial.

Μλαδεν Γογαλα

-- 

Mladen Gogala

Sr. Oracle DBA

1500 Broadway

New York, NY 10036

(212) 329-5251

www.vmsinfo.com