Re: "SELECT .. WHERE NOT IN" query running for hours

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Fair enough!

I also turned seqscan off, so the new plan (for the NOT EXISTS) is:

Merge Anti Join (cost=0.00..212686.89 rows=1 width=313) (actual time=0.426..14921.344 rows=63836 loops=1)
 Merge Cond: ((d2.hwdocid)::text = (d.hwdocid)::text)
 -> Index Scan using hwdocid2_uniq on "Doc2" d2 (cost=0.00..19442.87 rows=96454 width=313) (actual time=0.130..1248.783 rows=96454 loops=1)
 -> Index Scan using hwdocid_uniq on "Document" d (cost=0.00..189665.17 rows=949272 width=12) (actual time=0.085..11158.740 rows=948336 loops=1)
Total runtime: 15062.925 ms

Hmm.. doesn't really seem to be such a great boost on performance. But i guess I'll be sticking to this one.

So my follow-up question on the subject is this:

Are there any particular semantics for the "NOT IN" statement that cause the correlated query to execute for every row of the outter query, as opposed to the "NOT EXISTS" ? Or are there any other practical reasons, related to "IN / NOT IN", for this to be happening? Or is it simply due to implementation details of each RDBMS? I guess the former (or the 2nd one), since, as you say, this is common in most databases, but I would most appreciate an answer to clarify this.

Thanks again!

Best regards,
George



2011/1/7 Mladen Gogala <mladen.gogala@xxxxxxxxxxx>
On 1/6/2011 9:36 PM, Γιωργος Βαλκανας wrote:

1) Why is it taking *so* long for the first query (with the "NOT IN" ) to do even the simple select?
Because NOT IN has to execute the correlated subquery for every row and then check whether the requested value is in the result set, usually by doing sequential comparison. The NOT EXIST plan is also bad because there is no index but at least it can use very fast and efficient hash algorithm. Indexing the "hwdocid" column on the "Document" table or, ideally, making it a primary key, should provide an additional boost to your query. If you already do have an index, you may consider using enable_seqscan=false for this session, so that the "hwdocid" index will be used. It's a common wisdom that in the most cases NOT EXISTS will beat NOT IN. That is so all over the database world. I've seen that in Oracle applications, MS SQL applications and, of course MySQL applications. Optimizing queries is far from trivial.

Μλαδεν Γογαλα

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux